Encoding device and encoding method, decoding device and decoding method, and program

ABSTRACT

There is provided a decoding device including at least one circuit configured to acquire one or more encoded audio signals including a plurality of channels and/or a plurality of objects and priority information for each of the plurality of channels and/or the plurality of objects, and to decode the one or more encoded audio signals according to the priority information.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of Japanese Priority PatentApplication JP 2014-060486 filed Mar. 24, 2014, and Japanese PriorityPatent Application JP 2014-136633 filed Jul. 2, 2014, the entirecontents of each of which are incorporated herein by reference.

TECHNICAL FIELD

The present technology relates to an encoding device and an encodingmethod, and a decoding device and a decoding method and a program,particularly to an encoding device and an encoding method, a decodingdevice and decoding method and the program in which an amount ofcalculation for decoding an audio signal can be reduced.

BACKGROUND ART

For example, as a method of encoding an audio signal, a multi-channelencoding under a moving picture experts group (MPEG)-2 advanced audiocoding (AAC) standard, MPEG-4 AAC standard, and MPEG-D unified speechand audio coding (USAC) which are International Standards, have beenproposed (for example, refer to NPL 1 and NPL 2).

CITATION LIST Non Patent Literature

NPL 1: INTERNATIONAL STANDARD ISO/IEC 14496-3 Fourth edition Sep. 1,2009 Information technology-coding of audio-visual objects-part3: AudioNPL 2: INTERNATIONAL STANDARD ISO/IEC 23003-3 First edition Apr. 1, 2012Information technology-coding of audio-visual objects-part3: Unifiedspeech and audio coding

SUMMARY OF INVENTION Technical Problem

Incidentally, it is necessary to provide an encoding technology usingmore channels more sense of presence in reproduction or a transmissionof a plurality of sound materials (objects) than in 5.1 channel surroundreproduction in the related art.

For example, a case of encoding and decoding audio signals of 24channels and a plurality of objects, and a case of encoding and decodingan audio signal of two channels, are considered. In this case, in amobile device having a poor calculation capability, it is possible todecode the audio signal of two channels in real time, however, there isa case where the decoding of the audio signals of 24 channels and aplurality of objects in real time is difficult.

In the current audio codec such as MPEG-D USAC or the like, since it isnecessary to decode the audio signals of all the channels and all theobjects, it is difficult to reduce the amount of calculation at the timeof decoding. Therefore, there is a problem in that it is not possible toreproduce the audio signal in real time depending on the devices at thedecoding side.

It is desirable to provide an encoding device and an encoding method, adecoding device and a decoding method and the program in which an amountof calculation for decoding can be reduced.

Solution to Problem

A decoding device according to a first embodiment of the presenttechnology includes at least one circuit configured to acquire one ormore encoded audio signals including a plurality of channels and/or aplurality of objects and priority information for each of the pluralityof channels and/or the plurality of objects and to decode the one ormore encoded audio signals according to the priority information.

The at least one circuit may be configured to decode according to thepriority information at least in part by decoding at least one of theone or more encoded audio signals for which a priority degree indicatedby the priority information is equal to or higher than a degree, andrefraining from decoding at least one other of the one or more encodedaudio signals for which a priority degree indicated by the priorityinformation is less than the degree.

The at least one circuit is configured to change the degree based atleast in part on the priority information for the plurality of channelsand/or the plurality of objects.

The at least one circuit may be configured to acquire a plurality ofsets of priority information for the one or more encoded audio signals,and the at least one circuit may be configured to decode the one or moreencoded audio signals at least in part by selecting one of the sets ofpriority information and decoding based at least in part on the one setof priority information.

The at least one circuit may be configured to select the one of the setsof priority information according to a calculation capability of thedecoding device.

The at least one circuit may be further configured to generate thepriority information based at least in part on the encoded audio signal.

The at least one circuit may be configured to generate the priorityinformation based at least in part on a sound pressure or a spectralshape of the audio of the one or more encoded audio signals.

The priority information for the plurality of channels and/or theplurality of objects may comprise, for at least one first channel of theplurality of channels and/or at least one first object of the pluralityof objects, priority information indicating different priority degreesof the at least one first channel and/or at least one first object overa period of time, and the at least one circuit may be configured todecode based on the priority information at least in part bydetermining, for the first channel and/or the first object and at afirst time during the period of time, whether or not to decode the firstchannel and/or the first object at the first time based at least in parton a priority degree for the first channel and/or the first object atthe first time and a priority degree for the first channel and/or thefirst object at another time before or after the first time and duringthe period of time.

The at least one circuit may be further configured to generate an audiosignal for a first time at least in part by adding an output audiosignal for a channel or object at the time and an output audio signal ofthe channel or object at a second time before or after the first time,wherein the output audio signal for the channel or object for a time isa signal obtained by the at least one circuit as a result of decoding ina case where decoding of the channel or object for the time is performedand is zero data in a case where decoding of the channel or object forthe time is not performed, and to perform a gain adjustment of theoutput audio signal of the channel or object at the time based on thepriority information of the channel or object at the time and thepriority information of the channel or object at the other time beforeor after the time.

The at least one circuit may be further configured to adjust a gain of ahigh frequency power value for the channel or object based on thepriority information of the channel or object at the first time and thepriority information of the channel or object at the second time beforeor after the first time, and generate a high frequency component of theaudio signal for the first time based on the high frequency power valueof which the gain is adjusted and the audio signal of the time.

The at least one circuit may be further configured to generate, for eachchannel or each object, an audio signal of the first time in which ahigh frequency component is included, based on a high frequency powervalue and the audio signal of the time, and to perform the gainadjustment of the audio signal of the first time in which the highfrequency component is included.

The at least one circuit may be further configured to assign an audiosignal of a first object, of the plurality of objects, to each of atleast some of the plurality of channels with a gain value based on thepriority information and to generate the audio of each of the pluralityof channels.

A decoding method or a program according to the first embodiment of thepresent technology includes: acquiring priority information for each ofa plurality of channels and/or a plurality of objects of one or moreencoded audio signals, and decoding the plurality of channels and/or theplurality of objects according to the priority information.

According to the first embodiment of the present technology, priorityinformation for each of a plurality of channels and/or a plurality ofobjects of one or more encoded audio signals is acquired; and theplurality of channels and/or the plurality of objects are decodedaccording to the priority information.

An encoding device according to a second embodiment of the presenttechnology includes: at least one circuit configured to generatepriority information for each of a plurality of channels and/or aplurality of objects of an audio signal, and to store the priorityinformation in a bit stream.

The at least one circuit may be configured to generate the priorityinformation at least in part by generating a plurality of sets ofpriority information for each of the plurality of channels and/orplurality of objects.

The at least one circuit may be configured to generate the plurality ofsets of priority information for each of a plurality of calculationcapabilities of decoding devices.

The at least one circuit may be configured to generate the priorityinformation based at least in part on a sound pressure or a spectralshape of the audio signal.

The at least one circuit may be further configured to encode audiosignals of the plurality of channels and/or the plurality of objects ofthe audio signal to form an encoded audio signal, and the at least onecircuit may be further configured to store the priority information andthe encoded audio signal in the bit stream.

An encoding method and a program according to the second embodiment ofthe present technology includes: generating priority information foreach of a plurality of channels and/or a plurality of objects of anaudio signal and storing the priority information in a bit stream.

According to the second embodiment of the present technology, priorityinformation for each of a plurality of channels and/or a plurality ofobjects of an audio signal is generated, and the priority information isstored in a bit stream.

Advantageous Effects of Invention

According to the first embodiment and the second embodiment, it ispossible to reduce the amount of calculation for decoding.

The effects described here are not necessarily limited hereto, and theeffects described here may be any effect that is described in thisdisclosure.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a diagram explaining a bit stream.

FIG. 2 is a diagram explaining an encoding.

FIG. 3 is a diagram explaining priority information.

FIG. 4 is a diagram explaining meanings of values of the priorityinformation.

FIG. 5 is a diagram illustrating a configuration example of an encodingdevice.

FIG. 6 is a diagram illustrating a channel audio encoding unit.

FIG. 7 is a diagram illustrating an object audio encoding unit.

FIG. 8 is a flowchart explaining encoding processing.

FIG. 9 is a diagram illustrating a configuration example of a decodingdevice.

FIG. 10 is a configuration example of an unpacking/decoding unit.

FIG. 11 is a flow chart explaining decoding processing.

FIG. 12 is a flow chart explaining selective decoding processing.

FIG. 13 is another configuration example of the unpacking/decoding unit.

FIG. 14 is a flow chart explaining the selective decoding processing.

FIG. 15 is a diagram illustrating an example of syntax of metadata of anobject.

FIG. 16 is a diagram explaining generation of an audio signal.

FIG. 17 is a diagram explaining generation of an audio signal.

FIG. 18 is a diagram explaining selection of an output destination of anMDCT coefficient.

FIG. 19 is a diagram explaining a gain adjustment of the audio signaland a power value in a high frequency band.

FIG. 20 is a diagram explaining a gain adjustment of the audio signaland the power value in the high frequency band.

FIG. 21 is a diagram illustrating another configuration example of theunpacking/decoding unit.

FIG. 22 is a flow chart explaining selective decoding processing.

FIG. 23 is a diagram explaining a gain adjustment of the audio signal.

FIG. 24 is a diagram explaining a gain adjustment of the audio signal.

FIG. 25 is a diagram illustrating another configuration example of theunpacking/decoding unit.

FIG. 26 is a flow chart explaining selective decoding processing.

FIG. 27 is a diagram explaining a VBAP gain.

FIG. 28 is a diagram explaining a VBAP gain.

FIG. 29 is a diagram illustrating another configuration example of theunpacking/decoding unit.

FIG. 30 is a flow chart explaining decoding processing.

FIG. 31 is a flow chart explaining selective decoding processing.

FIG. 32 is a diagram illustrating a configuration example of a computer.

DESCRIPTION OF EMBODIMENTS

Hereinafter, embodiments to which the present technology is applied willbe described referring to the drawings.

First Embodiment

<Overview of Present Technology>

In encoding an audio signal of each channel that consists a signal ofmulti-channels and an audio signal of an object, in the presenttechnology, an amount of calculation in decoding can be decreased bytransmitting priority information of the audio signal of each channeland priority information of the audio signal of each object.

In addition, in the present technology, in the decoding side,frequency-time conversion is performed in a case where a priority degreewhich is indicated by the priority information of each channel or ofeach object is equal to or larger than a predetermined priority degree,and frequency-time conversion is not performed and the result of thefrequency-time conversion is made to be zero in a case where thepriority degree which is indicated in the priority information of eachchannel or of each object is smaller than the predetermined prioritydegree, and thus, the amount of calculation in decoding the audiosignals can be decreased.

Hereinafter, a case where the audio signal of each channel that consiststhe signal of multi-channels and the audio signal of the object areencoded according to the AAC standards will be described. However, in acase where the encoding by another method, the same processing will beperformed.

For example, in a case where the audio signal of each channel thatconsists the multi-channel and the audio signal of a plurality ofobjects are encoded aacording to the AAC standards and transmitted, theaudio signal of each channel or each object is encoded and transmittedfor each frame.

Specifically, as illustrated in FIG. 1, the encoded audio signal orinformation necessary for decoding the audio signal is stored in aplurality of elements (bit stream elements), and a bit stream made ofthose bit stream elements is transmitted.

In this example, in the bit stream for one frame, the element EL1 to theelement ELt of t number are disposed in order from the head, and finallyan identifier TERM indicating the end position of the frame relating toinformation of the frame is disposed.

For example, the element EL1 disposed on the head is an ancillary dataarea called a data stream element (DSE), and information about each of aplurality of channels such as information about a down-mixing of theaudio signal or identification information are described in the DSE.

In the elements EL2 to ELt subsequent to ELEMENT Ell, the encoded audiosignals are stored.

Particularly, an element in which an audio signal of a single channel isstored is called an SCE, and an element in which an audio signal of apair of two channels is stored is called a CPE. In addition, an audiosignal of each object is called an SCE. In addition, the audio signal ofeach object is stored in the SCE.

In the present technology, the priority information of the audio signalof each channel that consists a signal of multi-channels and thepriority information of the audio signal of the object are generated andstored in the DSE.

For example, as illustrated in FIG. 2, it is assumed that the audiosignals of successive frames F11 to F13 are encoded.

In this case, an encoding device (an encoder) analyzes the degree of thepriority degree of the audio signal of each channel for each of thoseframes, and for example, as illustrated in FIG. 3, and generates thepriority information of each channel. Similarly, the encoding devicealso generates the priority information of the audio signal of eachobject.

For example, the encoding device analyzes the degree of the prioritydegree of the audio signal based on a sound pressure or a spectral shapeof the audio signal, and a corelation of spectral shapes betweenchannels or between objects.

In FIG. 3, the priority information of each channel in a case where thetotal number of channels is M is illustrated as an example. That is,with regard to each channel from the channel having a channel number of0 to the channel having a channel number of M−1, a numerical valueindicating the priority degree of the signal of those channels isillustrated as the priority information.

For example, the priority information of the channel having the channelnumber of 0 is 3, and the priority information of the channel having thechannel number of 1 is 0. A channel having a predetermined channelnumber of m (m=0, 1, . . . , m−1) is assumed to also be called a channelm.

The value of the priority information illustrated in FIG. 3 is any valuefrom 0 to 7 as illustrated in FIG. 4, as the value of the priorityinformation increases, the priority degree at the time of reproducingthe audio signal, that is, an importance degree becomes higher.

Therefore, the audio signal of which the value of the priorityinformation is 0 has the lowest priority degree, and the audio signal ofwhich the value of the priority information is 7 has the highestpriority degree.

In a case where the audio signal of multi-channels and the audio signalof the plurality of objects are simultaneously reproduced, a sound notso important compared to another sound is included in the soundreproduced from these audio signals. In other words, even though aspecific sound from the entire sounds is not reproduced, there exists asound of the extent that does not cause an uncomfortable feeling to alistener.

Therefore, if the decoding for the audio signal of which the prioritydegree is low is not performed if necessary it is possible to suppressthe deterioration of the sound quality and decrease the amount ofcalculation for decoding. Therefore, in the encoding device, theimportance degree of each audio signal at the time of reproducing, thatis, the priority information indicating the priority in decoding isassigned to each audio signal for each frame in such a manner that anaudio signal which will not be decoded can be appropriately selected.

As described above, when the priority information for each audio signalis determined, the priority information is stored in the DSE of theelement EL1 illustrated in FIG. 1. Particularly, in the example in FIG.3, since the number of channels that configures the audio signal ofmulti-channels is M, the priority information of each of M channels ofchannel 0 to channel M−1 is stored in the DSE.

Similarly, the priority information of each object is also stored in theDSE of the element EL1. Here, for example, when it is assumed that thereare N objects of object numbers from 0 to N−1, the priority informationof each of the N objects is determined, and is stored in the DSE.

Hereinafter, the object of a predetermined object number n (n=0, 1, . .. , N−1) is also called an object n.

In this manner, if the priority information is determined for each audiosignal, in the reproduction side, that is, in the decoding side for theaudio signal, it is possible to simply specify which audio signal isimportant at the time of reproducing and is to be decoded with priority,that is, to be used in reproducing.

Referring back to FIG. 2, for example, it is assumed that the priorityinformation of the audio signals of frame F11 and frame F13 in apredetermined channel is 7, and the priority information of the audiosignal of the frame F12 in the predetermined channel is 0.

In addition, it is assumed that the decoding is not performed withrespect to the audio signal of which the priority degree is lower than apredetermined priority degree at the side of decoding the audio signal,that is, in a decoding device (the decoder).

Here, for example, if the predetermined priority degree is called athreshold value and if the threshold value is 4, in the exampledescribed above, the decoding is performed with respect to the audiosignals of the frame F11 and the frame F13 in the predetermined channelof which the priority information is 7.

On the other hand, the decoding is not performed with respect to theaudio signal of the frame F12 in a predetermined channel of which thepriority information is 0.

Therefore, in this example, the audio signal of the frame F12 becomes asoundless signal, and the audio signals of the frame F11 and the frameF13 are synthesized, and then becomes the final audio signal of thepredetermined channel.

More specifically, for example, at the time of encoding each audiosignal, time-frequency conversion with respect to the audio signal isperformed and information obtained by the time-frequency conversion isencoded, and then, encoded data obtained as a result of the encoding isstored in the element.

Any processing may be performed for the time-frequency conversion.However, hereinafter, the description will be continued in which amodified discrete cosine transform (MDCT) is performed as thetime-frequency conversion.

In addition, in the decoding device, the decoding is performed withrespect to the encoded data, and an inverse modified discrete cosinetransform (IMDCT) is performed with respect to an MDCT coefficientobtained from the result of the decoding, and then, the audio signal isgenerated. That is, here, the IMDCT is performed as an inverseconversion (frequency-time conversion) to the time-frequency conversion.

For this reason, more specifically, the IMDCT is performed with respectto the frame F11 and the frame F13 of which the priority information isequal to or higher than 4 which is a value of the threshold value, andthe audio signal is generated.

In addition, the IMDCT is not performed with respect to the frame F12 ofwhich the priority information is lower than 4 which is a value of thethreshold value, and the result of the IMDCT is 0, and then, the audiosignal is generated. In this way, the audio signal of the frame F12becomes a soundless signal, that is, zero data.

Furthermore, as another example, in an example illustrated in FIG. 3,when the threshold value is 4, among the audio signals of each ofchannel 0 to channel M−1, the decoding is not performed for the audiosignals of the channel 0, the channel 1, and the channel M−2 of whichthe value of the priority information is lower than the threshold valueof 4.

As described above, according to a result of comparison between thepriority information and the threshold value, the decoding is notperformed with respect to the audio signal of which the priority degreeindicated by the priority information is low, and thus, it is possibleto minimize the deterioration of the sound quality and decrease theamount of calculation for decoding.

<Configuration Example of Encoding Device>

Next, a specific embodiment of the encoding device and the decodingdevice to which the present technology is applied will be described.First, the encoding device will be described.

FIG. 5 is a diagram illustrating a configuration example of the encodingdevice to which the present technology is applied.

The encoding device 11 in FIG. 5 includes a channel audio encoding unit21, an object audio encoding unit 22, a meta-data input unit 23, and apacking unit 24.

The audio signal of each channel of the multi-channel signal of whichthe number of channels is M is supplied to the channel audio encodingunit 21. For example, the audio signal of each of the channels issupplied from microphones corresponding to those channels. In FIG. 5,the letters “#0” to “#M−1” indicate the channel numbers of therespective channels.

The channel audio encoding unit 21 encodes the supplied audio signal ofeach channel, and generates the priority information based on the audiosignal, and then, supplies the encoded data obtained by the encoding andthe priority information to the packing unit 24.

The audio signal of each of the N channels is supplied to the objectaudio encoding unit 22. For example, the audio signals of the objectsare respectively supplied from microphones corresponding to thosechannels. In FIG. 5, the letters “#0” to “#N−1” indicate the objectnumbers of the respective objects.

The object audio encoding unit 22 encodes the supplied audio signal ofeach channel, and generates the priority information based on the audiosignal, and then, supplies the encoded data obtained by the encoding andthe priority information to the packing unit 24.

The meta-data input unit 23 supplies meta-data of each object to thepacking unit 24. For example, the meta-data of each object is assumed tobe spatial position information indicating a position of the object inthe space. More specifically, for example, the spatial positioninformation may be three-dimensional coordinates information thatindicates the positional coordinates of the object in three-dimensionalspace.

The packing unit 24 performs packing of the encoded data and thepriority information supplied from the channel audio encoding unit 21,the encoded data and the priority information supplied from the objectaudio encoding unit 22, and the meta-data supplied from the meta-datainput unit 23, and generates a bit stream to output the generated bitstream.

In the bit stream obtained in this way, the encoded data of each channelfor each frame, the priority information of each channel, the encodeddata of each object, the priority information of each object, and themeta-data of each object are included.

Here, the audio signal of each of the M channels and the audio signal ofeach of the N objects stored in the bit stream of one frame are theaudio signals of the same frame to be simultaneously reproduced.

Here, as the priority information of the audio signal of each channel orof each object, the example in which the priority information isgenerated with respect to each audio signal of one frame is described.However, one priority information item may be generated with respect tothe audio signals of several frames, for example, within a unit ofpredetermined time.

<Configuration Example of Channel Audio Encoding Unit>

In addition, a more specific configuration of the channel audio encodingunit 21 in FIG. 5 is configured, for example, as illustrated in FIG. 6.

The channel audio encoding unit 21 illustrated in FIG. 6 includes anencoding unit 51 and a priority information generation unit 52.

The encoding unit 51 includes an MDCT unit 61, and the encoding unit 51encodes the audio signal of each channel supplied from the outside.

That is, the MDCT unit 61 performs the MDCT with respect to the audiosignal of each channel supplied from the outside. The encoding unit 51encodes the MDCT coefficient of each channel obtained by the MDCT, andsupplies the encoded data of each channel obtained by the encoding as aresult, that is, the encoded audio signal to the packing unit 24.

In addition, the priority information generation unit 52 analyzes theaudio signal of each channel supplied from the outside, and generatesthe priority information of the audio signal of each channel, andsupplies the priority information to the packing unit 24.

<Configuration Example of Object Audio Encoding Unit>

Furthermore, a more specific configuration of the object audio encodingunit 22 in FIG. 5 is configured, for example, as illustrated in FIG. 7.

The object audio encoding unit 22 illustrated in FIG. 7 includes anencoding unit 91 and a priority information generation unit 92.

The encoding unit 91 includes an MDCT unit 101, and the encoding unit 91encodes the audio signal of each object supplied from the outside.

That is, the MDCT unit 101 performs the MDCT with respect to the audiosignal of each object supplied from the outside. The encoding unit 91encodes the MDCT coefficient of each channel obtained by the MDCT, andsupplies the encoded data of each object obtained by the encoding as aresult, that is, the encoded audio signal to the packing unit 24.

In addition, the priority information generation unit 92 analyzes theaudio signal of each object supplied from the outside, and generates thepriority information of the audio signal of each object, and suppliesthe priority information to the packing unit 24.

<Description on Encoding Processing>

Next, the processing performed by the encoding device 11 will bedescribed.

When audio signals of the plurality of channels and audio signals of theplurality of objects that are simultaneously reproduced are suppliedonly for one frame, the encoding device 11 performs the encodingprocessing and outputs the bit stream including the encoded audiosignals.

Hereinafter, the encoding processing by the encoding device 11 will bedescribed referring to the flow chart in FIG. 8. The encoding processingis performed for each frame of the audio signal.

In STEP S11, the priority information generation unit 52 of the channelaudio encoding unit 21 generates the priority information of thesupplied audio signal of each channel, and supplies the priorityinformation to the packing unit 24. For example, the priorityinformation generation unit 52 analyzes the audio signal for eachchannel, and generates the priority information based on the soundpressure or the spectral shape of the audio signal and the corelation ofthe spectral shapes between the channels.

In STEP S12, the packing unit 24 stores the priority information of theaudio signal of each channel supplied from the priority informationgeneration unit 52 in the DSE of the bit stream. That is, the priorityinformation is stored in the head element of the bit stream.

In STEP S13, the priority information generation unit 92 of the objectaudio encoding unit 22 generates the priority information of thesupplied audio signal of each object, and supplies the priorityinformation to the packing unit 24. For example, the priorityinformation generation unit 92 analyzes the audio signal for eachobject, and generates the priority information based on the soundpressure or the spectral shape of the audio signal and the corelation ofthe spectral shapes between the channels.

When the priority information of the audio signal of each channel or ofeach object is generated, for each priority degree which is the value ofthe priority information, the number of the audio signals to which thepriority degrees are assigned may be determined in advance with respectto the number of channels or the number of objects.

For example, in the example in FIG. 3, the number of audio signals towhich the priority information of “7” is assigned, that is, the numberof channels may be determined as five in advance, and the number ofaudio signals to which the priority information of “6” is assigned maybe determined as three in advance.

In STEP S14, the packing unit 24 stores the priority information of theaudio signal of each object supplied from the priority informationgeneration unit 92 in the DSE of the bit stream.

In STEP S15, the packing unit 24 stores the meta-data of each object inthe DSE of the bit stream.

For example, the meta-data input unit 23 acquires the meta-data of eachobject by receiving an input from a user, communicating with theoutside, or performing reading from a storage region outside, andsupplies the meta-data to the packing unit 24. The packing unit 24stores the meta-data supplied in this manner from the meta-data inputunit 23 in the DSE.

As a result of above-described processing, the priority information ofthe audio signals of all the channels, the priority information of theaudio signals of all the objects, and the meta-data of all the objectsare stored in the DSE of the bit stream.

In STEP S16, the encoding unit 51 of the channel audio encoding unit 21encodes the supplied audio signal of each channel.

Specifically, the MDCT unit 61 performs the MDCT with respect to theaudio signal of each channel, and the encoding unit 51 encodes the MDCTcoefficient of each channel obtained by the MDCT, and supplies theencoded data of each channel obtained as the result of the encoding tothe packing unit 24.

In STEP S17, the packing unit 24 stores the encoded data of the audiosignal of each channel supplied from the encoding unit 51 in the SCE orthe CPE of the bit stream. That is, the encoded data is stored in eachelement disposed subsequent to the DSE in the bit stream.

In STEP S18, the encoding unit 91 of the object audio encoding unit 22encodes the supplied audio signal of each object.

Specifically, the MDCT unit 101 performs the MDCT with respect to theaudio signal of each object, and the encoding unit 91 encodes the MDCTcoefficient of each channel obtained by the MDCT, and supplies theencoded data of each object obtained as the result of the encoding tothe packing unit 24.

In STEP S19, the packing unit 24 stores the encoded data of the audiosignal of each object supplied from the encoding unit 91 in the SCE ofthe bit stream. That is, the encoded data is stored in some elementsdisposed later than the DSE in the bit stream.

As a result of above-described processing, with regard to the frames tobe processed, the bit stream can be obtained, in which the priorityinformation and the encoded data of the audio signals of all thechannels, the priority information and the encoded data of the audiosignals of all the objects, and the meta-data of all the objects arestored.

In STEP S20, the packing unit 24 outputs the obtained bit stream andends the encoding processing.

As described above, the encoding device 11 generates the priorityinformation of the audio signal of each channel and the priorityinformation of the audio signal of each object, stores the priorityinformation in the bit stream, and outputs the priority information.Therefore, in the decoding side, it is possible to simply ascertainwhich audio signal has a higher priority degree.

In this way, in the decoding side, it is possible to selectively performthe decoding of the encoded audio signal according to the priorityinformation. As a result, it is possible to minimize the deteriorationof the sound quality of the sound reproduced from the audio signal anddecrease the amount of calculation for decoding.

Particularly, by storing the priority information of the audio signal ofeach object in the bit stream, in the decoding side, it is possible notonly to decrease the amount of calculation for decoding but also todecrease the amount of calculation thereafter for the processing ofrendering or the like.

<Configuration Example of Decoding Device>

Next, the decoding device will be described, to which the bit streamoutput from the encoding device 11 described above is input and whichdecodes the encoded data included in the bit stream.

Such a decoding device is configured, for example, as illustrated inFIG. 9.

The decoding device 151 illustrated in FIG. 9 includes anunpacking/decoding unit 161, a rendering unit 162, and a mixing unit163.

The unpacking/decoding unit 161 acquires the bit stream output from theencoding device 11 and performs unpacking and decoding of the bitstream.

The unpacking/decoding unit 161 supplies the audio signal of each objectobtained by the unpacking and decoding and the meta-data of each objectto the rendering unit 162. At this time, the unpacking/decoding unit 161performs the decoding of the encoded data of each object according tothe priority information included in the bit stream.

In addition, the unpacking/decoding unit 161 supplies the audio signalof each channel obtained from the unpacking and decoding to the mixingunit 163. At this time, the unpacking/decoding unit 161 performs thedecoding of the encoded data of each channel according to the priorityinformation included in the bit stream.

The rendering unit 162 generates audio signals of M channels based onthe audio signal of each object supplied from the unpacking/decodingunit 161 and the spatial position information as the meta-data of eachobject, and supplies the audio signals to the mixing unit 163. At thistime, the rendering unit 162 generates the audio signals of each of theM channels in such a manner that a sound image of each object will becorrectly positioned at the position indicated by the spatial positioninformation of each object.

The mixing unit 163 performs weighted addition of the audio signal ofeach channel supplied from the unpacking/decoding unit 161 and the audiosignal of each channel supplied from the rendering unit 162 for eachchannel, and then, generates a final audio signal of each channel. Themixing unit 163 supplies the final audio signal of each channel obtainedas described above to the outside speaker corresponding to each channelto reproduce the sound.

<Configuration Example of Unpacking/Decoding Unit>

In addition, more specifically, the unpacking/decoding unit 161 of thedecoding device 151 illustrated in FIG. 9 is configured, for example, asillustrated in FIG. 10.

The unpacking/decoding unit 161 in FIG. 10 includes a priorityinformation acquisition unit 191, a channel audio signal acquisitionunit 192, a channel audio signal decoding unit 193, an output selectionunit 194, a zero value output unit 195, an IMDCT unit 196, an objectaudio signal acquisition unit 197, an object audio signal decoding unit198, an output selection unit 199, a zero value output unit 200, and anIMDCT unit 201.

The priority information acquisition unit 191 acquires the priorityinformation of the audio signal of each channel from the supplied bitstream and supplies the priority information to the output selectionunit 194, and acquires the priority information of the audio signal ofeach object from the bit stream and supplies the priority information tothe output selection unit 199.

In addition, the priority information acquisition unit 191 acquires themeta-data of each object from the supplied bit stream and supplies themeta-data to the rendering unit 162, and supplies the bit stream to thechannel audio signal acquisition unit 192 and the object audio signalacquisition unit 197.

The channel audio signal acquisition unit 192 acquires the encoded dataof each channel from the bit stream supplied from the priorityinformation acquisition unit 191 and supplies the encoded data to thechannel audio signal decoding unit 193. The channel audio signaldecoding unit 193 decodes the encoded data of each channel supplied fromthe channel audio signal acquisition unit 192 and supplies the MDCTcoefficient obtained as the result of the decoding to the outputselection unit 194.

The output selection unit 194 selectively switches the outputdestination of the

MDCT coefficient of each channel supplied from the channel audio signaldecoding unit 193 based on the priority information of each channelsupplied from the priority information acquisition unit 191.

That is, in a case where the priority information of a predeterminedchannel is lower than a predetermined threshold value P, the outputselection unit 194 supplies the MDCT coefficient of that channel to thezero value output unit 195 as a value zero. In addition, in a case wherethe priority information of a predetermined channel is equal to orhigher than the predetermined threshold value P, the output selectionunit 194 supplies the MDCT coefficient of that channel supplied from thechannel audio signal decoding unit 193 to the IMDCT unit 196.

The zero value output unit 195 generates the audio signal based on theMDCT coefficient supplied from output selection unit 194 and suppliesthe audio signal to the mixing unit 163. In this case, since the MDCTcoefficient is zero, a soundless audio signal is generated.

THE IMDCT unit 196 performs the IMDCT and generates the audio signalbased on the MDCT coefficient supplied from the output selection unit194, and supplies the audio signal to the mixing unit 163.

The object audio signal acquisition unit 197 acquires the encoded dataof each object from the bit stream supplied from the priorityinformation acquisition unit 191 and supplies the encoded data to theobject audio signal decoding unit 198. The object audio signal decodingunit 198 decodes the encoded data of each object supplied from theobject audio signal acquisition unit 197 and supplies the MDCTcoefficient obtained from the result of the decoding to the outputselection unit 199.

The output selection unit 199 selectively switches the outputdestination of the MDCT coefficient of each channel supplied from theobject audio signal decoding unit 198 based on the priority informationof each object supplied from the priority information acquisition unit191.

That is, in a case where the priority information of a predeterminedobject is lower than a predetermined threshold value Q, the outputselection unit 199 supplies the MDCT coefficient of that object to thezero value output unit 200 as a value zero. In addition, in a case wherethe priority information of a predetermined object is equal to or higherthan the predetermined threshold value Q, the output selection unit 199supplies the MDCT coefficient of that object supplied from the objectaudio signal decoding unit 198 to the IMDCT unit 201.

A value of the threshold value Q may be the same as the value of thethreshold value P, or may be a value different from the threshold valueP. By appropriately determining the threshold value P and the thresholdvalue Q depending on the calculation ability or the like of the decodingdevice 151, it is possible to decrease the amount of calculation fordecoding the audio signal down to the amount of calculation within therange within which the decoding device 151 can perform the decoding inreal time.

The zero value output unit 200 generates the audio signal based on theMDCT coefficient supplied from output selection unit 199 and suppliesthe audio signal to the rendering unit 162. In this case, since the MDCTcoefficient is zero, a soundless audio signal is generated.

THE IMDCT unit 201 performs the IMDCT and generates the audio signalbased on the MDCT coefficient supplied from the output selection unit199, and supplies the audio signal to the rendering unit 162.

<Description on Decoding Processing>

Next, an operation of the decoding device 151 will be described.

When the bit stream of one frame is supplied from the encoding device11, the decoding device 151 performs the decoding processing andgenerates the audio signal, and outputs the audio signal to the speaker.Hereafter, the decoding processing performed by the decoding device 151will be described referring to a flow chart in FIG. 11.

In STEP S51, the unpacking/decoding unit 161 acquires the bit streamtransmitted from the encoding device 11. That is, the bit stream isreceived.

In STEP S52, the unpacking/decoding unit 161 performs a selectivedecoding processing.

The selective decoding processing will be described later in detail,however, in the selective decoding processing, the encoded data of eachchannel and the encoded data of each object are selectively decodedbased on the priority information. Then, the audio signal of eachchannel obtained as a result of selective decoding is supplied to themixing unit 163, and the audio signal of each object obtained as aresult of selective decoding is supplied to the rendering unit 162. Inaddition, the meta-data of each object obtained from the bit stream issupplied to the rendering unit 162.

In STEP S53, the rendering unit 162 performs rendering of the audiosignal of each object based on the audio signal of each object suppliedfrom the unpacking/decoding unit 161 and the spatial positioninformation as the meta-data of each object.

For example, the rendering unit 162 generates the audio signal of eachchannel by a vector base amplitude panning (VBAP) based on the spatialposition information in such a manner that the sound image of eachobject is correctly positioned at the position indicated by the spatialposition information, and supplies the audio signal to the mixing unit163.

In STEP S54, the mixing unit 163 performs weighted addition of the audiosignal of each channel supplied from the unpacking/decoding unit 161 andthe audio signal of each channel supplied from the rendering unit 162for each channel, and supplies the added audio signal to the outsidespeaker. In this way, the audio signal of each channel is supplied toeach speaker corresponding to each channel, therefore, the sound beingreproduced based on the audio signal supplied to each speaker.

When the audio signal of each channel is supplied to the speaker, thedecoding processing ends.

As described above, the decoding device 151 acquires the priorityinformation from the bit stream, and decodes the encoded data of eachchannel and each object according to the priority information.

<Description on Selective Decoding Processing>

Subsequently, the selective decoding processing corresponding to theprocessing in STEP S52 in FIG. 11 will be described referring to a flowchart in FIG. 12.

In STEP S81, the priority information acquisition unit 191 acquires thepriority information of the audio signal of each channel and thepriority information of the audio signal of each object from thesupplied bit stream, and supplies each item of priority information itemto the output selection unit 194 and the output selection unit 199respectively.

In addition, the priority information acquisition unit 191 acquires themeta-data of each object from the bit stream and supplies the meta-datato the rendering unit 162, and supplies the bit stream to the channelaudio signal acquisition unit 192 and the object audio signalacquisition unit 197.

In STEP S82, the channel audio signal acquisition unit 192 sets achannel number of 0 in the channel to be processed, and holds thechannel number.

In STEP S83, the channel audio signal acquisition unit 192 determineswhether or not the held channel number is less than the number ofchannels M.

In a case where, in STEP S83, the channel number is less than M, in STEPS84, the channel audio signal decoding unit 193 decodes the encoded dataof the audio signal of the channel to be processed.

That is, the channel audio signal acquisition unit 192 acquires theencoded data of the channel subject to be processed from the bit streamsupplied from the priority information acquisition unit 191, andsupplies the encoded data to the channel audio signal decoding unit 193.

Then, the channel audio signal decoding unit 193 decodes the encodeddata supplied from the channel audio signal acquisition unit 192, andsupplies the MDCT coefficient obtained as a result of the decoding tothe output selection unit 194.

In STEP S85, the output selection unit 194 determines whether or not thepriority information of the channel subject to be processed suppliedfrom the priority information acquisition unit 191 is equal to or higherthan the threshold value P specified by a control device on a higherlevel which is not illustrated. Here, the threshold value P is, forexample, determined depending on the calculation capability of thedecoding device 151.

In a case where it is determined, in STEP S85, that the priorityinformation is equal to or higher than the threshold value P, the outputselection unit 194 supplies the MDCT coefficient of the channel subjectto be processed supplied from the channel audio signal decoding unit 193to the IMDCT unit 196, and the process proceeds to STEP S86. In thiscase, the priority degree of the audio signal of the channel subject tobe processed is equal to or higher than the predetermined prioritydegree. Therefore, the decoding of that channel, more specifically, theIMDCT is performed.

In STEP S86, the IMDCT unit 196 performs the IMDCT based on the MDCTcoefficient supplied from the output selection unit 194, and generatesthe audio signal of the channel subject to be processed and supplies theaudio signal to the mixing unit 163. After the audio signal beinggenerated, the process proceeds to STEP S87.

On the other hand, in a case where it is determined, in STEP S85, thatthe priority information is lower than the threshold value P, the outputselection unit 194 supplies the MDCT coefficient to the zero valueoutput unit 195 as a zero value.

The zero value output unit 195 generates the audio signal of the channelsubject to be processed from the MDCT coefficient of which the value iszero supplied from the output selection unit 194, and supplies the audiosignal to the mixing unit 163. Therefore, in the zero value output unit195, no processing for generating the audio signal such as IMDCT issubstantially performed.

The audio signal generated by the zero value output unit 195 is asoundless signal. After the audio signal being generated, the processproceeds to STEP S87.

If it is determined, in STEP S85, that the priority information is lowerthan the threshold value P or the audio signal is generated in STEP S86,in STEP S87, the channel audio signal acquisition unit 192 adds one tothe held channel number to update the channel number of the channelsubject to be processed.

After the channel number being updated, the process returns to STEP S83,and the processing described above is repeatedly performed. That is, theaudio signal of a new channel subject to be processed is generated.

In addition, in a case where, in STEP S83, it is determined that thechannel number of the channel subject to be processed is not less thanM, since the audio signals of all the channels have been obtained, theprocess proceeds to STEP S88.

In STEP S88, the object audio signal acquisition unit 197 sets an objectnumber as 0 to the object subject to be processed, and holds the objectnumber.

In STEP S89, the object audio signal acquisition unit 197 determineswhether or not the held object number is less than the number of objectsN.

In a case where, in STEP S89, it is determined that the object number isless than N, in STEP S90, the object audio signal decoding unit 198decodes the encoded data of the audio signal of the object to beprocessed.

That is, the object audio signal acquisition unit 197 acquires theencoded data of the object subject to be processed from the bit streamsupplied from the priority information acquisition unit 191, andsupplies the encoded data to the object audio signal decoding unit 198.

Then, the object audio signal decoding unit 198 decodes the encoded datasupplied from the object audio signal acquisition unit 197, and suppliesthe MDCT coefficient obtained as a result of the decoding to the outputselection unit 199.

In STEP S91, the output selection unit 199 determines whether or not thepriority information of the object subject to be processed supplied fromthe priority information acquisition unit 191 is equal to or higher thanthe threshold value Q specified by a control device on a higher levelwhich is not illustrated. Here, the threshold value Q is, for example,determined depending on the calculation capability of the decodingdevice 151.

In a case where it is determined, in STEP S91, that the priorityinformation is equal to or higher than the threshold value Q, the outputselection unit 199 supplies the MDCT coefficient of the object subjectto be processed supplied from the object audio signal decoding unit 198to the IMDCT unit 201, and the process proceeds to STEP S92.

In STEP S92, the IMDCT unit 201 performs the IMDCT based on the MDCTcoefficient supplied from the output selection unit 199, and generatesthe audio signal of the object subject to be processed and supplies theaudio signal to the rendering unit 162. After the audio signal isgenerated, the process proceeds to STEP S93.

On the other hand, in a case where it is determined, in STEP S91, thatthe priority information is lower than the threshold value Q, the outputselection unit 199 supplies the MDCT coefficient to the zero valueoutput unit 200 as a zero value.

The zero value output unit 200 generates the audio signal of the objectsubject to be processed from the MDCT coefficient of which the value iszero supplied from the output selection unit 199, and supplies the audiosignal to the rendering unit 162. Therefore, in the zero value outputunit 200, no processing for generating the audio signal such as IMDCT issubstantially performed.

The audio signal generated by the zero value output unit 200 is asoundless signal. After the audio signal is generated, the processproceeds to STEP S93.

If it is determined that the priority information is lower than thethreshold value Q in STEP S91 or the audio signal is generated in STEPS92, in STEP S93, the object audio signal acquisition unit 197 adds oneto the held object number to update the object number of the objectsubject to be processed.

After the channel number being updated, the process returns to STEP S89,and the processing described above is repeatedly performed. That is, theaudio signal of a new object subject to be processed is generated.

In addition, in a case where, in STEP S89, it is determined that thechannel number of the channel subject to be processed is not less thanM, since the audio signals of all the channels and all the objects havebeen obtained, the selective decoding processing ends, and then, theprocess proceeds to STEP S53 in FIG. 11.

As described above, the decoding device 151 compares the priorityinformation and the threshold value of each channel and each object, anddecodes the encoded audio signal with determining whether or not toperform the decoding of the encoded audio signal for each channel andeach object of the frame to be processed.

That is, in the decoding device 151, only the predetermined number ofencoded audio signals depending on the priority information of eachaudio signal are decoded, and the remaining audio signals are notdecoded.

In this way, to match the reproduction environment, only the audiosignal having the high priority degree can be selectively decoded.Therefore, it is possible to minimize the deterioration of the soundquality of the sound reproduced from the audio signal and decrease theamount of calculation for decoding.

Furthermore, the decoding of the encoded audio signal is performed basedon the priority information of the audio signal of each object.Therefore, it is possible to decrease not only the amount of calculationfor decoding the audio signal but also the amount of calculation for theprocessing thereafter such as the processing in the rendering unit 162.

MODIFICATION EXAMPLE 1 OF FIRST EMBODIMENT

<Priority Information>

In the above description, one priority information item is generatedwith respect to one audio signal of each channel and each object.However, a plurality of priority information items may be generated.

In this case, for example, a plurality of priority information items canbe generated for every calculation capability according to the amount ofcalculation for decoding, that is, the calculation capability at thedecoding side.

Specifically, for example, the priority information items for the devicehaving the calculation capability equivalent to two channels aregenerated based on the amount of calculation for decoding the audiosignals equivalent to two channels in real time.

In the priority information items for the device equivalent to twochannels, for example, among all the audio signals, the priorityinformation items are generated such that the number of audio signalsbecomes large, to which the low priority degree, that is, the valueclose to 0 is assigned as the priority information.

In addition, for example, the priority information items for the devicehaving the calculation capability equivalent to 24 channels are alsogenerated based on the amount of calculation for decoding the audiosignals equivalent to 24 channels in real time. In the priorityinformation items for the device equivalent to 24 channels, for example,among all the audio signals, the priority information items aregenerated such that the number of audio signals becomes large, to whichthe high priority degree, that is, the value close to 7 is assigned asthe priority information.

In this case, for example, the priority information generation unit 52,in STEP S11 in FIG. 8, generates the priority information items for thedevice equivalent to two channels with respect to the audio signal ofeach channel and adds an identifier indicating that the priorityinformation items are for the device equivalent to two channels to thepriority information items, and then, supplies the priority informationitems to the packing unit 24.

Furthermore, the priority information generation unit 52, in STEP S11,generates the priority information items for the device equivalent to 24channels with respect to the audio signal of each channel and adds anidentifier indicating that the priority information items are for thedevice equivalent to 24 channels to the priority information items, andthen, supplies the priority information items to the packing unit 24.

Similarly, the priority information generation unit 92, in STEP S13 inFIG. 8, also generates the priority information items for the deviceequivalent to two channels and the priority information items for thedevice equivalent to 24 channels and adds the identifier, and then,supplies the priority information items to the packing unit 24.

In this way, for example, a plurality of priority information items areobtained according to the calculation capability of the reproductiondevices such as a portable audio player, a multi-functional mobilephone, a tablet-type computer, a television receiver, a personalcomputer, and high-quality audio equipment.

For example, the calculation capability of reproduction devices such asa portable audio player is relatively low. Therefore, in such areproduction device, if the encoded audio signal is decoded based on thepriority information items for the device equivalent to two channels, itis possible to perform the reproduction of the audio signals in realtime.

As described above, in a case where a plurality of priority informationitems are generated with respect to one audio signal, in the decodingdevice 151, for example, the priority information acquisition unit 191is instructed by a control device on a higher level to determine whichpriority information among the plurality of priority information itemswill be used for performing the decoding. The instruction to determinewhich priority information will be used is performed by supplying, forexample, the identifier.

A determination that which priority information of the identifier willbe used may be made in advance for each decoding device 151.

For example, in the priority information acquisition unit 191, in a casewhere which priority information on the identifier to use is determinedin advance, or in a case where the identifier is designated by thecontrol device on a higher level, in STEP S81 in FIG. 12, the priorityinformation acquisition unit 191 acquires the priority information towhich the determined identifier is added. Then, the acquired priorityinformation is supplied to the output selection unit 194 or the outputselection unit 199 from the priority information acquisition unit 191.

In other words, among a plurality of the priority information itemsstored in the bit stream, one appropriate priority information item isselected according to the calculation capability of the decoding device151, specifically, of the unpacking/decoding unit 161.

In this case, different identifiers may be used in the priorityinformation of each channel and the priority information of each object,and the priority information may be read from the bit stream.

As described above, by selectively acquiring a specific priorityinformation item among a plurality of priority information itemsincluded in a bit stream, it is possible to select appropriate priorityinformation according to the calculation capability of the decodingdevice 151, and to perform decoding. In this way, it is possible toreproduce the audio signal in real time in any of the decoding devices151.

Second Embodiment

<Configuration Example of Unpacking/Decoding Unit>

In the above description, an example in which the priority informationis included in the bit stream output from the encoding device 11 isdescribed. However, depending on the encoding devices, the priorityinformation may or may not be included in the bit stream.

Therefore, the priority information may be generated in the decodingdevice 151. For example, the priority information can be generated usingthe information indicating the sound pressure of the audio signal or theinformation indicating the spectral shape that can be extracted from theencoded data of the audio signal included in the bit stream.

In a case where the priority information is generated in the decodingdevice 151 as described above, the unpacking/decoding unit 161 of thedecoding device 151 is, for example, configured as illustrated in FIG.13. In FIG. 13, the same reference signs are given to the elementscorresponding to the case in FIG. 10 and the description thereof willnot be repeated.

The unpacking/decoding unit 161 in FIG. 13 includes the channel audiosignal acquisition unit 192, the channel audio signal decoding unit 193,the output selection unit 194, the zero value output unit 195, the IMDCTunit 196, the object audio signal acquisition unit 197, the object audiosignal decoding unit 198, the output selection unit 199, the zero valueoutput unit 200, the IMDCT unit 201, a priority information generationunit 231, and a priority information generation unit 232.

A configuration of the unpacking/decoding unit 161 illustrated in FIG.13 is different from the unpacking/decoding unit 161 illustrated in FIG.10 in that the priority information generation unit 231 and the priorityinformation generation unit 232 are newly provided without providing thepriority information acquisition unit 191, and other configurations arethe same as the unpacking/decoding unit 161 in FIG. 10.

The channel audio signal acquisition unit 192 acquires the encoded dataof each channel from the supplied bit stream and supplies the encodeddata to the channel audio signal decoding unit 193 and the priorityinformation generation unit 231.

The priority information generation unit 231 generates the priorityinformation of each channel based on the encoded data of each channelsupplied from the channel audio signal acquisition unit 192, andsupplies the priority information to the output selection unit 194.

The object audio signal acquisition unit 197 acquires the encoded dataof each object from the supplied bit stream and supplies the encodeddata to the object audio signal decoding unit 198 and the priorityinformation generation unit 232. In addition, the object audio signalacquisition unit 197 acquires the meta-data of each object from thesupplied bit stream and supplies the meta-data to the rendering unit162.

The priority information generation unit 232 generates the priorityinformation of each object based on the encoded data of each objectsupplied from the object audio signal acquisition unit 197, and suppliesthe priority information to the output selection unit 199.

<Description on Selective Decoding Processing>

In a case where the unpacking/decoding unit 161 is configured asillustrated in FIG. 13, the decoding device 151 performs the selectivedecoding processing illustrated in FIG. 14 as the processingcorresponding to STEP S52 of the decoding processing illustrated in FIG.11. Hereinafter, the selective decoding processing by the decodingdevice 151 will be described referring to a flow chart in FIG. 14.

In STEP S131, the priority information generation unit 231 generates thepriority information of each channel.

For example, the channel audio signal acquisition unit 192 acquires theencoded data of each channel from the supplied bit stream, and suppliesthe encoded data to the channel audio signal decoding unit 193 and thepriority information generation unit 231.

The priority information generation unit 231 generates the priorityinformation of each channel based on the encoded data of each channelsupplied from the channel audio signal acquisition unit 192, andsupplies the priority information to the output selection unit 194.

For example, in the bit stream, a scale factor for obtaining the MDCTcoefficient, side information, and a quantized spectrum are included asthe encoded data of the audio signal. Here, the scale factor isinformation for indicating the sound pressure of the audio signal andthe quantized spectrum is information indicating the spectral shape ofthe audio signal.

The priority information generation unit 231 generates the priorityinformation of the audio signal of each channel based on the scalefactor and the quantized spectrum included as the encoded data of eachchannel. If the priority information is generated using the scale factorand the quantized spectrum like this, the priority information canimmediately obtained before performing the decoding of the encoded data,and thus, it is possible to decrease the amount of calculation forgenerating the priority information.

Additionally, the priority information may be generated based on thesound pressure of the audio signal which can be obtained by calculatinga root-mean-square value of the MDCT coefficient or based on thespectral shape of the audio signal which can be obtained from the peakenvelope of the MDCT coefficient. In this case, the priority informationgeneration unit 231 appropriately performs the decoding of the encodeddata or acquires the MDCT coefficient from the channel audio signaldecoding unit 193.

After the priority information of each channel being obtained, theprocessing tasks from STEP S132 to STEP S137 are performed, but thoseprocessing tasks are the same as the processing tasks from STEP S82 toSTEP S87 in FIG. 12. Accordingly, the description thereof will not berepeated. However, in this case, since the encoded data of each channelhas been acquired already, only the decoding of the encoded data isperformed in STEP S134.

In addition, in a case where it is determined that the channel number isnot less than M in STEP S133, the priority information generation unit232 generates the priority information of the audio signal of eachobject in STEP S138.

For example, the object audio signal acquisition unit 197 acquires theencoded data of each object from the supplied bit stream, and suppliesthe encoded data to the object audio signal decoding unit 198 and thepriority information generation unit 232. In addition, the object audiosignal acquisition unit 197 acquires the meta-data of each object fromthe supplied bit stream and supplies the meta-data to the rendering unit162.

The priority information generation unit 232 generates the priorityinformation of each object based on the encoded data of each objectsupplied from the object audio signal acquisition unit 197, and suppliesthe priority information to the output selection unit 199. For example,similarly to the case of each channel, the priority information isgenerated based on the scale factor and the quantized spectrum.

In addition, the priority information may be generated based on thesound pressure or the spectral shape obtained from the MDCT coefficient.In this case, the priority information generation unit 232 appropriatelyperforms the decoding of encoded data or acquires the MDCT coefficientfrom the object audio signal decoding unit 198.

After the priority information of each object being obtained, theprocessing tasks from STEP S139 to STEP S144 are performed and selectivedecoding processing ends. However, those processing tasks are the sameas the processing tasks from STEP S88 to STEP S93 in FIG. 12.Accordingly, the description thereof will not be repeated. However, inthis case, since the encoded data of each object has been acquiredalready, only the decoding of the encoded data is performed in STEPS141.

After the selective decoding processing ends, the process proceeds toSTEP S53 in FIG. 11.

As described above, the decoding device 151 generates the priorityinformation of the audio signal of each channel and each object based onthe encoded data included in the bit stream. By being generated thepriority information in the decoding device 151 like this, it ispossible to obtain the appropriate priority information of each audiosignal with a small amount of calculation, and thus, it is possible todecrease the amount of calculation for decoding or the amount ofcalculation for rendering. In addition, it is also possible to minimizethe deterioration of the sound quality of the sound reproduced from theaudio signal.

In a case where the priority information acquisition unit 191 of theunpacking/decoding unit 161 illustrated in FIG. 10 tries to acquire thepriority information of the audio signal of each channel and each objectfrom the supplied bit stream, but the case where the priorityinformation may not be obtained from the bit stream, the priorityinformation may be generated. In this case, the priority informationacquisition unit 191 performs the processing similar to that of thepriority information generation unit 231 or the priority informationgeneration unit 232, and generates the priority information of the audiosignal of each channel and each object from the encoded data.

Third Embodiment

<Threshold Value of Priority Information>

Furthermore, in the description above, with regard to each channel andeach object, the audio signal to be decoded, specifically the MDCTcoefficient on which IMDCT is to be performed is selected by comparingthe priority information to the threshold value P or the threshold valueQ. However, the threshold value P or the threshold value Q may bedynamically changed for each frame of the audio signal.

For example, in the priority information acquisition unit 191 of theunpacking/decoding unit 161 illustrated in FIG. 10, the priorityinformation of each channel and each object can be acquired from the bitstream without performing the decoding.

Therefore, for example, the priority information acquisition unit 191can obtain a distribution of the priority information of the framesubject to be processed, without reading out the priority information ofthe audio signals of all the channels. In addition, the decoding device151 knows its own calculation capability in advance, for example, suchas how many channels can be processed simultaneously, that is, in realtime.

Therefore, the priority information acquisition unit 191 may determinethe threshold value P of the priority information with regard to theframe subject to be processed based on the distribution of the priorityinformation in the frame subject to be processed and the calculationcapability of the decoding device 151.

For example, the threshold value P is determined such that the largestnumber of audio signals can be decoded within the range of theprocessing being performed in real time by the decoding device 151.

In addition, the priority information acquisition unit 191 candynamically determine the threshold value Q similarly to the case of thethreshold value P. In this case, the priority information acquisitionunit 191 obtains the distribution of the priority information based onthe priority information of the audio signal of all the objects, anddetermines the threshold value Q of the priority information with regardto the frame subject to be processed based on the obtained distributionand the calculation capability of the decoding device 151.

It is possible to perform determination of the threshold value P or thethreshold value Q with a comparatively small amount of calculation.

In this way, by dynamically changing the threshold values of thepriority information, the decoding can be performed in real time and itis possible to minimize the deterioration of the sound quality of thesound reproduced from the audio signal. Particularly, in this case, itis not necessary to prepare a plural number of priority informationitems or it is not necessary to provide the identifier for the priorityinformation. Therefore, an amount of code of the bit stream can also bereduced.

<Meta-Data of Object>

Furthermore, in the first embodiment to the third embodiment describedabove, the meta-data and the priority information of the object for oneframe, or the like are stored in the head element of the bit stream.

In this case, in the head element of the bit stream, a syntax of thepart where the meta-data and the priority information of the object arestored is the same as illustrated in FIG. 15, for example.

In the example in FIG. 15, in the meta-data of the object, the spatialposition information and the priority information of the object of onlyone frame are stored.

In this example, “num_objects” indicates the number of objects. Inaddition, “object_priority[0]” indicates the priority information of a0^(th) object. Here, the 0^(th) object means an object specified by anobject number.

“Position_azimuth[0]” indicates a horizontal angle that represents thethree-dimensional spatial position of the 0^(th) object seen from theuser who is a listener, that is, seen from a predetermined referenceposition. In addition, “position_elevation[0]” indicates a verticalangle that represents the three-dimensional spatial position of the0^(th) object seen from the user who is a listener. Furthermore,“position_radius[0]” indicates a distance from the listener to the0^(th) object.

Therefore, the position of the object in three-dimensional space isspecified by these “position_azimuth[0]”, “position_elevation[0]”, and“position_radius[0]”. In this way, these information items are thespatial position information items of the object.

In addition, a “gain_factor[0]” indicates a gain of the 0^(th) object.

In this way, in the meta-data illustrated in FIG. 15,“object_priority[0]”, “position_azimuth[0]”, “position_elevation[0]”,“position_radius[0]”, and “gain_factor[0]”with regard to the object aredisposed in order as the data of the object. Then, in the meta-data, thedata items of each object are disposed in an array, for example, in anorder of the object number of the object.

Fourth Embodiment

<Noise due to Complete Reconfiguration and Discontinuity of AudioSignal>

In the description above, the example is described, in which the amountof processing at the time of decoding is reduced by omitting thedecoding of IMDCT or the like in a case where the priority informationof each frame (hereafter, particularly referred to time frame) for eachchannel or each object read out from the bit stream in the decodingdevice 151 is lower than the predetermined threshold value.Specifically, in a case where the priority information is lower than thethreshold value, a soundless audio signal, that is, the zero data isoutput from the zero value output unit 195 or the zero value output unit200 as the audio signal.

However, in this case, sound quality deterioration occurs whenlistening.

Specifically, there occurs the sound quality deterioration due to thecomplete reconfiguration of the audio signal and the sound qualitydeterioration due to noise such as a glitch noise caused by adiscontinuity of the signal.

<Sound Quality Deterioration Due to Complete Reconfiguration>

For example, when the zero data is output as the audio signal in a casewhere the priority information is lower than the threshold value, thesound quality deterioration occurs at the time of switching the outputof the zero data and the output of the ordinary audio signal which isnot the zero data.

As described above, in the unpacking/decoding unit 161, the IMDCT isperformed with respect to the MDCT coefficient for each time frame readout from the bit stream in the IMDCT unit 196 or the IMDCT unit 201.Specifically, in the unpacking/decoding unit 161, the audio signal ofthe present time frame is generated from the result of the IMDCT or thezero data with regard to the present time frame and the result of theIMDCT or the zero data with regard to the time frame before one timeframe.

Here, the generation of the audio signal will be described referring toFIG. 16. Here, the generation of the audio signal of the object isdescribed as an example. However, the generation of the audio signal ofeach channel is the same. In addition, in the description below, theaudio signal output from the zero value output unit 200 and the audiosignal output from the IMDCT unit 201 are also particularly referred toas an IMDCT signal. Similarly, the audio signal output from the zerovalue output unit 195 and the audio signal output from the IMDCT unit196 are also particularly referred to as an IMDCT signal.

In FIG. 16, the horizontal direction indicates the time and rectangleslabeled “data[n−1]” to “data[n+2]” respectively represent the bit streamof the time frame (n−1) to the time frame (n+2) of a predeterminedobject. In addition, the value in the bit stream of each time frameindicates the value of the priority information of the object of thattime frame. In this example, the value of the priority information ofthe frame is 7.

Furthermore, rectangles labeled “MDCT_coef[q]” (q=n−1, n, . . . ) inFIG. 16 represent the MDCT coefficients of the time frame (q),respectively.

Now, if the threshold value Q is equal to 4, the value of the priorityinformation of “7” of the time frame (n−1) is equal to or higher thanthe threshold value Q. Therefore, the IMDCT is performed with respect tothe MDCT coefficient of the time frame (n−1). Similarly, the value ofthe priority information of “7” of the time frame (n) is also equal toor higher than the threshold value Q. Therefore, the IMDCT is performedwith respect to the MDCT coefficient of the time frame (n).

As a result, an IMDCT signal OPS11 of the time frame (n−1) and an IMDCTsignal OPS12 of the time frame (n) are obtained.

In this case, the unpacking/decoding unit 161 sums the former half ofthe IMDCT signal OPS12 of the time frame (n) and the latter half of theIMDCT signal OPS11 of the time frame (n−1) which is one time framebefore the time frame (n), and obtains an audio signal of time frame(n), that is, an audio signal of a period FL (n). In other words, a partof IMDCT signal OPS11 in the period FL (n) and a part of IMDCT signalOPS12 in the period FL (n) are overlappingly added, and the audio signalof time frame (n) before the encoding of the object subject to beprocessed is reproduced.

Such processing is the processing necessary for the IMDCT signal to becompletely reconfigured to the signal or before the MDCT.

However, in the unpacking/decoding unit 161 described above, forexample, as illustrated in FIG. 17, at the timing when the IMDCT signalof the IMDCT unit 201 and the IMDCT signal of the zero value output unit200 are switched according to the priority information of each timeframe, the IMDCT signal is not completely reconfigured to the signalbefore the MDCT. That is, if the zero data is used instead of theoriginal signal at the time of overlap-addition, the signal is notcompletely reconfigured. Therefore, the original signal is notreproduced, and the sound quality when listening of the audio signaldeteriorates.

In an example of FIG. 17, parts corresponding to the case in FIG. 16 arewritten in the same letter, and the description thereof will not berepeated.

In FIG. 17, the value of the priority information of the time frame(n−1) is “7”, but the priority information items of the time frame (n)to the time frame (n+2) are the lowest “0”.

Therefore, if the threshold value Q is 4, the IMDCT with regard to theframe (n−1) is performed with respect to the MDCT coefficient by theIMDCT unit 201, and then, the IMDCT signal OPS21 of the time frame (n−1)is obtained. On the other hand, the IMDCT with regard to the time frame(n) is not performed with respect to the MDCT coefficient, the zero dataoutput from the zero value output unit 200 becoming the IMDCT signalOPS22 of the time frame (n).

In this case, the former half of the zero data which is the IMDCT signalOPS22 of the time frame (n) and the latter half of the IMDCT signalOPS21 of the time frame (n−1) which is one frame before the time frame(n) are summed, and the result becomes the final audio signal of thetime frame (n). That is, parts of the IMDCT signal OPS22 and the IMDCTsignal OPS21 in the period FL (n) are overlappingly added, and theresult becomes the final audio signal of the time frame (n) of theobject subject to be processed.

In this way, when the output source of the IMDCT signal is switched fromthe IMDCT unit 201 to the zero value output unit 200 or switched fromthe zero value output unit 200 to the IMDCT unit 201, the IMDCT signalfrom the IMDCT unit 201 not being completely reconfigured, thedeterioration of the sound quality when listening occurs.

<The Sound Quality Deterioration due to Generation of Noise Caused byDiscontinuity>

In addition, in a case where the output source of the IMDCT signal isswitched from the IMDCT unit 201 to the zero value output unit 200 orswitched from the zero value output unit 200 to the IMDCT unit 201,since the signal is not completely reconfigured, in some cases, thesignal is discontinuous in a connection portion of the IMDCT signalobtained from the IMDCT and the IMDCT signal which becomes the zerodata. As a result, the glitch noise occurs at the connection portion,and sound quality deterioration when listening to the audio signaloccurs.

Furthermore, in order to improve the sound quality in theunpacking/decoding unit 161, there is a case where spectral bandreplication (SBR) processing or the like is performed with respect tothe audio signal obtained by overlappingly adding the IMDCT signalsoutput from the IMDCT unit 201 and the zero value output unit 200.

Various processing tasks can be considered for the processing subsequentto the IMDCT unit 201 or the zero value output unit 200, andhereinafter, the description will be continued with SBR as an example.

In SBR, a high frequency component of the original audio signal beforethe encoding is generated from the audio signal of a low frequencycomponent obtained by the overlapping addition and a high frequencypower value stored in a bit stream.

Specifically, the audio signal of one frame is divided into a fewsections called time slots, and the audio signal of each time slot isband-divided into a signal (hereafter, referred to as low frequencysub-band signal) of a plurality of low frequency sub-bands.

Then, a signal of each sub-band of high frequency (hereinafter, referredto as a high frequency sub-band signal) is generated based on the lowfrequency sub-band signal of each sub-band and the power value of eachsub-band in the high frequency side. For example, a target highfrequency sub-band signal is generated by adjusting the power of a lowfrequency sub-band signal of a predetermined sub-band by the power valueof the target sub-band of high frequency, or by shifting the frequencythereof.

Furthermore, the high frequency sub-band signal and the low frequencysub-band signal are synthesized, the audio signal including the highfrequency component is generated, and the audio signal including thehigh frequency component generated for each time slot are combined, andthe result becomes the audio signal of one time frame including the highfrequency component.

In a case where such an SBR is performed in the stage subsequent to theIMDCT unit 201 or the zero value output unit 200, with regard to theaudio signal made from the IMDCT signal output from the IMDCT unit 201,the high frequency component is generated by the SBR. Incidentally,since the IMDCT signal output from the zero value output unit 200 is thezero data, with regard to the audio signal made from the IMDCT signaloutput from the zero value output unit 200, the high frequency componentobtained by the SBR is also the zero data.

Then, when the output source of the IMDCT signal is switched from theIMDCT unit 201 to the zero value output unit 200 or switched from thezero value output unit 200 to the IMDCT unit 201, the signal becomesdiscontinuous in a connection portion in the high frequency side aswell. In such a case, the glitch noise occurs and sound qualitydeterioration when listening occurs.

Therefore, in the present technology, the output destination of the MDCTcoefficient is selected considering the previous and next time frames,and fade-in processing and fade-out processing with respect to the audiosignal are performed, and thus, the sound quality deterioration whenlistening described above is suppressed and the sound quality isimproved.

<Selection of Output Destination of MDCT Coefficient ConsideringPrevious and Next Time Frames>

First, the selection of the output destination of the MDCT coefficientconsidering the previous and next time frame will be described. Here,the description will be made with the audio signal of the object as anexample as well. However, the description is similar to the case of theaudio signal of each channel. In addition, the processing tasksdescribed below are performed for each object and each channel.

For example, in the embodiment described above, it was described thatthe output selection unit 199 selectively switches the outputdestination of the MDCT coefficient of each object based on the priorityinformation of the present time frame. On the other hand, in the presentembodiment, the output selection unit 199 switches the outputdestination of the MDCT coefficient based on the priority informationitems of three consecutive time frames in time, those being, the presenttime frame, the time frame of one time frame before the present timeframe, and the time frame of one time frame after the present timeframe. In other words, whether the decoding of the encoded data isperformed or not is selected based on the priority information items ofthree consecutive time frames.

Specifically, the output selection unit 199, in a case where theconditional formula indicated in the following Formula (1) is satisfiedwith regard to the object subject to be processed, supplies the MDCTcoefficient of the time frame (n) of the object to the IMDCT unit 201.

[Math.1]

(object_priority[n−1]≧thre)∥(object_priority[n]≧thre)∥(object_priority[n+1]≧thre)  (1)

In Formula (1), object_priority[q] (where, q=n−1, n, n+1) indicates thepriority information of each time frame (q), and thre indicates thethreshold value Q.

Therefore, among the three consecutive time frames of the present timeframe, and the time frames before and after the present time frame, in acase where there is at least one or more time frame of which thepriority information is equal to or higher than the threshold value Q,the IMDCT unit 201 is selected as an MDCT coefficient supplydestination. In this case, the decoding of the encoded data,specifically, the IMDCT with respect to the MDCT coefficient isperformed. On the other hand, if the priority information items of allthe three time frames are lower than the threshold value Q, the MDCTcoefficient is zero and is output to the zero value output unit 200. Inthis case, the decoding of the encoded data, specifically, the IMDCTwith respect to the MDCT coefficient is substantially not performed.

In this way, as illustrated in FIG. 18, the audio signal beingcompletely reconfigured from the IMDCT signal, the deterioration of thesound quality when listening is suppressed. In FIG. 18, the partscorresponding to the case in FIG. 16 are written in the same letters orthe like, and the description thereof will not be repeated.

In the example illustrated in upper diagram in FIG. 18, the value of thepriority information of each time frame is the same as that in theexample illustrated in FIG. 17. For example, the threshold value Q isassumed to be 4, in the upper diagram in FIG. 18, the priorityinformation of the time frame (n−1) is equal to or higher than thethreshold value Q, but the priority information items of the time frame(n) to the time frame (n+2) are lower than the threshold value Q.

For this reason, from the conditional formula illustrated in Formula(1), the IMDCT is performed with respect to the MDCT coefficients of thetime frame (n−1) and the time frame (n), and then, an IMDCT signal OPS31and an IMDCT signal OPS32 are obtained respectively. On the other hand,in the time frame (n+1) where the conditional formula is not satisfied,the IMDCT with respect to the MDCT coefficient is not performed, andthen, the zero data is an IMDCT signal OPS33.

Therefore, the audio signal of the time frame (n) which is notcompletely reconfigured in the example illustrated in FIG. 17 iscompletely reconfigured in the example illustrated in the upper diagramin FIG. 18, and then, the deterioration of the sound quality whenlistening is suppressed. However, in this example, since the audiosignal is not completely reconfigured in the next time frame (n+1),fade-out processing described below is performed in the time frame (n)and the time frame (n+1), and thus, the deterioration of the soundquality when listening is suppressed.

In addition, in the example illustrated in the lower diagram in FIG. 18,the priority information items in the time frame (n−1) to the time frame(n+1) are lower than the threshold value Q, and the priority informationtime frame (n+2) is equal to or higher than the threshold value Q.

For this reason, from the conditional formula illustrated in Formula(1), the IMDCT is not performed with respect to the MDCT coefficient inthe time frame (n) where the conditional formula is not satisfied, andthen, the zero data is an IMDCT signal OPS41. On the other hand, theIMDCT is performed with respect to the MDCT coefficients of the timeframe (n+1) and the time frame (n+2), and then, an IMDCT signal OPS42and tan IMDCT signal OPS43 are obtained respectively.

In this example, the audio signal can be completely reconfigured in thetime frame (n+2) where the value of the priority information is switchedfrom a value lower than the threshold value Q to a value equal to orhigher than the threshold value Q. Therefore, it is possible to suppressthe deterioration of the sound quality when listening. However, even inthis case, since the audio signal of the time frame (n+1) immediatelybefore the time frame (n+2) is not completely reconfigured, fade-inprocessing described below is performed in the time frame (n+1) and thetime frame (n+2), and thus, the deterioration of the sound quality whenlistening is suppressed.

Here, a pre-reading of the priority information for only one time frameis performed, and then, the output destination of the MDCT coefficientis selected from the priority information items of three consecutivetime frames. For this reason, in the example illustrated in the upperdiagram in FIG. 18, the fade-out processing is performed in the timeframe (n) and the time frame (n+1), and in the example illustrated inthe lower diagram in FIG. 18, the fade-in processing is performed in thetime frame (n+1) and the time frame (n+2).

However, in a case where the pre-reading of the priority information fortwo time frames can be performed, the fade-out processing may beperformed in the time frame (n+1) and the time frame (n+2) in theexample illustrated in the upper diagram in FIG. 18, and the fade-inprocessing may be performed in the time frame (n) and the time frame(n+1) in the example illustrated in the lower diagram in FIG. 18.

<Fade-In Processing and Fade-Out Processing>

Next, the fade-in processing and the fade-out processing with respect tothe audio signal will be described. Here, the description will be madewith the audio signal of the object as an example as well. However, thedescription is similar to the case of the audio signal of each channel.In addition, the fade-in processing and the fade-out processing areperformed for each object and each channel.

In the present technology, for example, as in the example illustrated inFIG. 18, the fade-in processing or the fade-out processing is performedin the time frame where the IMDCT signal obtained by the IMDCT and theIMDCT signal which is the zero data are overlappingly added, and in thetime frame before or after the above-described time frame.

In the fade-in processing, gain adjustment with respect to the audiosignal is performed such that the amplitude (magnitude) of the audiosignal of the time frame increases with time. Conversely, in thefade-out processing, the gain adjustment with respect to the audiosignal is performed such that the amplitude (magnitude) of the audiosignal of the time frame decreases with time.

In this way, even in a case where the connection portion of the IMDCTsignal obtained by the IMDCT and the IMDCT signal which is the zero datais discontinuous, it is possible to suppress the deterioration of thesound quality when listening. Hereinafter, at the time of such the gainadjustment, a gain value by which the audio signal is multiplied isparticularly referred to as a fading signal gain.

Furthermore, in the present technology, in the SBR with regard to theconnection portion of the IMDCT signal obtained by the IMDCT and theIMDCT signal which is the zero data, the fade-in processing or thefade-out processing is performed as well.

That is, in the SBR, a power value of each high frequency sub-band isused for each time slot. However, in the present technology, the powervalue of each high frequency sub-band is multiplied by the gain valuedetermined for the fade-in processing or for the fade-out processing foreach time slot, and then, the SBR is performed. That is, the gainadjustment of the high frequency power value is performed.

Hereinafter, the gain value by which the power value of each highfrequency sub-band is multiplied and determined for each time slot isparticularly referred to as a fading SBR gain.

Specifically, the gain value of the fading SBR gain for the fade-inprocessing is determined so as to increase with time, that is, so as toincrease as large as the gain value of the fading SBR gain of the nexttime slot. Conversely, the gain value of the fading SBR gain for thefade-out processing is determined so as to increase as small as the gainvalue of the fading SBR gain of the next time slot.

In this way, by performing the fade-in processing or the fade-outprocessing at the time of SBR as well, it is possible to suppress thedeterioration of the sound quality when listening even when the highfrequency is discontinuous.

Specifically, the processing tasks illustrated in, for example, FIG. 19and FIG. 20 are performed as the gain adjustment such as the fade-inprocessing or the fade-out processing with respect to the audio signaland the high frequency power value. In FIG. 19 and FIG. 20, the partscorresponding to the case in FIG. 18 are written in the same letters orsigns, and the description thereof will not be repeated.

An example in FIG. 19 is a case of the example illustrated in the upperdiagram in FIG. 18. In this example, the audio signals of the time frame(n) and the time frame (n+1) are multiplied by the fading signal gainindicated by a polygonal line GN11.

The value of the fading signal gain illustrated in the polygonal lineGN11 linearly changes from “1” to “0” with time at the portion of thetime frame (n), and is continuously “0” at the portion of the time frame(n+1). Therefore, since the audio signal gradually changes to the zerodata by adjusting the gain of the audio signal using the fading signalgain, it is possible to suppress the deterioration of the sound qualitywhen listening.

In addition, in this example, the high frequency power value of eachtime slot of the time frame (n) is multiplied by the fading SBR gainillustrated in an arrow GN12.

The value of the fading SBR gain illustrated by the arrow GN12 changesfrom “1” to “0” with time so as to decrease to being as small as that inthe next time slot. Therefore, since the high frequency component of theaudio signal gradually changes to the zero data by adjusting the highfrequency gain using the fading SBR signal gain, it is possible tosuppress the deterioration of the sound quality when listening.

On the other hand, an example illustrated in FIG. 20 is a case of theexample illustrated in the lower diagram in FIG. 18. In this example,the audio signal of time frame (n+1) and the time frame (n+2) ismultiplied by the fading signal gain illustrated in a polygonal lineGN21.

The value of the fading signal gain illustrated in the polygonal lineGN21 is continuously “0” at the portion of the time frame (n+1), andlinearly changes from “0” to “1” with time at the portion of the timeframe (n+2). Therefore, since the audio signal gradually changes to theoriginal signal from the zero data by adjusting the gain of the audiosignal using the fading signal gain, it is possible to suppress thedeterioration of the sound quality when listening.

In addition, in this example, the high frequency power value of eachtime slot of the time frame (n+2) is multiplied by the fading SBR gainillustrated in an arrow GN22.

The value of the fading SBR gain illustrated by the arrow GN22 changesfrom “0” to “1” with time so as to increase to being as large as that inthe next time slot. Therefore, since the high frequency component of theaudio signal gradually changes to the original signal from the zero databy adjusting the high frequency gain using the fading SBR signal gain,it is possible to suppress the deterioration of the sound quality whenlistening.

<Configuration Example of Unpacking and Decoding Unit>

In a case where the selection of the output destination of the MDCTcoefficient and the gain adjustment such as the fade-in processing orthe fade-out processing are performed as described above, theunpacking/decoding unit 161 is configured as illustrated in FIG. 21. InFIG. 21, the parts corresponding to the case in FIG. 10 are written inthe same signs, and the description thereof will not be repeated.

The unpacking/decoding unit 161 in FIG. 21 includes the priorityinformation acquisition unit 191, the channel audio signal acquisitionunit 192, the channel audio signal decoding unit 193, the outputselection unit 194, the zero value output unit 195, the IMDCT unit 196,an overlap adding unit 271, a gain adjustment unit 272, an SBRprocessing unit 273, the object audio signal acquisition unit 197, theobject audio signal decoding unit 198, the output selection unit 199,the zero value output unit 200, the IMDCT unit 201, an overlap addingunit 274, a gain adjustment unit 275, and an SBR processing unit 276.

The configuration of the unpacking/decoding unit 161 illustrated in FIG.21 is a configuration in which components of from the overlap addingunit 271 to the SBR processing unit 276 are additionally provided to theconfiguration of the unpacking/decoding unit 161 illustrated in FIG. 10.

The overlap adding unit 271 overlappingly adds the IMDCT signal (theaudio signal) supplied from the zero value output unit 195 or the IMDCTunit 196, and generates the audio signal of each time frame, and then,supplies the audio signal to the gain adjustment unit 272.

The gain adjustment unit 272 adjusts the gain of the audio signalsupplied from the overlap adding unit 271 based on the priorityinformation supplied from the priority information acquisition unit 191,and supplies the result to the SBR processing unit 273.

The SBR processing unit 273 acquires the power value of each highfrequency sub-band for each time slot from the priority informationacquisition unit 191, and adjusts the gain of the high frequency powervalue based on the priority information supplied from the priorityinformation acquisition unit 191. In addition, the SBR processing unit273 performs the SBR with respect to the audio signal supplied from thegain adjustment unit 272 using the high frequency power value of whichthe gain is adjusted, and then, supplies the audio signal obtained as aresult of the SBR to the mixing unit 163.

The overlap adding unit 274 overlappingly adds the IMDCT signals (theaudio signal) supplied from the zero value output unit 200 or the IMDCTunit 201, and generates the audio signal of each time frame, and then,supplies the audio signal to the gain adjustment unit 275.

The gain adjustment unit 275 adjusts the gain of the audio signalsupplied from the overlap adding unit 274 based on the priorityinformation supplied from the priority information acquisition unit 191,and supplies the audio signal to the SBR processing unit 276.

The SBR processing unit 276 acquires the power value of each highfrequency sub-band from the priority information acquisition unit 191for each time slot, and adjusts the gain of the high frequency powervalue based on the priority information supplied from the priorityinformation acquisition unit 191. In addition, the SBR processing unit276 performs the SBR with respect to the audio signal supplied from gainadjustment unit 275 using the high frequency power value of which thegain is adjusted, and then, supplies the audio signal obtained as aresult of the SBR to the rendering unit 162.

<Description on Selective Decoding Processing>

Subsequently, the operation of the decoding device 151 in a case wherethe unpacking/decoding unit 161 has a configuration illustrated in FIG.21 will be described. In this case, the decoding device 151 performs thedecoding processing described referring to FIG. 11. However, theprocessing illustrated in FIG. 22 is performed as the selective decodingprocessing in STEP S52.

Hereinafter, the selective decoding processing corresponding to theprocessing in STEP S52 in FIG. 11 will be described referring to a flowchart in FIG. 22.

In STEP S181, the priority information acquisition unit 191 acquires thehigh frequency power value of the audio signal of each channel from thesupplied bit stream and supplies the high frequency power value to theSBR processing unit 273, and acquires the high frequency power value ofthe audio signal of each object from the supplied bit stream andsupplies the high frequency power value to the SBR processing unit 276.

After the high frequency power value being acquired, the processingtasks in STEP S182 to STEP S187 are performed, and the audio signal(IMDCT signal) of the channel subject to be processed is generated.However, those processing tasks are similar to those in STEP S81 to STEPS86 in FIG. 12, and the description thereof will not be repeated.

However, in STEP 5186, in a case where a condition similar to Formula(1) described above is satisfied, that is, in a case where at least oneor more priority information items are equal to or higher than thethreshold value P among the priority information of the present timeframe of the channel to be processed and the priority information itemsof time frames immediately before and immediately after the present timeframe of the channel subject to be processed, it is determined that thepriority information is equal to or higher than the threshold value P.In addition, the IMDCT signal generated in the zero value output unit195 or the IMDCT unit 196 is output to the overlap adding unit 271.

In a case where it is not determined that the priority information isequal to or higher than the threshold value P in STEP S186 or the IMDCTsignal is generated in STEP S187, the processing in STEP 5188 isperformed.

In STEP S188, the overlap adding unit 271 performs the overlappingaddition of the IMDCT signals supplied from the zero value output unit195 or the IMDCT unit 196, and supplies the audio signal of the presenttime frame obtained as a result of the overlapping addition to the gainadjustment unit 272.

Specifically, for example, as described referring to FIG. 18, the formerhalf of the IMDCT signal of the present time frame and the latter halfof the IMDCT signal immediately before the present time frame aresummed, and becomes the audio signal of the present time frame.

In STEP S189, the gain adjustment unit 272 adjusts the gain of the audiosignal supplied from the overlap adding unit 271 based on the priorityinformation of the channel subject to be processed supplied from thepriority information acquisition unit 191, and supplies the result ofthe gain adjustment to the SBR processing unit 273.

Specifically, in a case where the priority information of the time frameimmediately before the present time frame is equal to or higher than thethreshold value P and the priority information of the present time frameand the priority information of the time frame immediately after thepresent time frame are lower than the threshold value P, the gainadjustment unit 272 adjusts the gain of the audio signal at the fadingsignal gain illustrated in the polygonal line GN11 in FIG. 19. In thiscase, time frame (n) in FIG. 19 corresponds to the present time frame,and in the time frame immediately after the present time frame, asillustrated on the polygonal line GN11, the gain adjustment at thefading signal gain of zero is performed.

In addition, in a case where the priority information of the presenttime frame is equal to or higher than the threshold value P and thepriority information items of two time frames immediately before thepresent time frame are lower than the threshold value P, the gainadjustment unit 272 adjusts the gain of the audio signal at the fadingsignal gain illustrated on the polygonal line GN21 in FIG. 20. In thiscase, time frame (n+2) in FIG. 20 corresponds to the present time frame,and in the time frame immediately before the present time frame, asillustrated on the polygonal line GN21, the gain adjustment at thefading signal gain of zero is performed.

The gain adjustment unit 272 performs the gain adjustment only in thecase of the two examples described above, and does not perform the gainadjustment in other cases, and supplies the audio signal to the SBRprocessing unit 273 as it is.

In STEP S190, the SBR processing unit 273 performs the SBR with respectto the audio signal supplied from the gain adjustment unit 272 based onthe high frequency power value and the priority information of thechannel subject to be processed supplied from the priority informationacquisition unit 191.

Specifically, in a case where the priority information of the time frameimmediately before the present time frame is equal to or higher than thethreshold value P, the priority information of the present time frameand the priority information of the time frame immediately after thepresent time frame are lower than the threshold value P, the SBRprocessing unit 273 adjusts the gain of the high frequency power valueat the fading SBR gain illustrated by the arrow GN12 in FIG. 19. Thatis, the high frequency power value is multiplied by the fading SBR gain.

Then, the SBR processing unit 273 performs the SBR using the highfrequency power value of which the SBR, and supplies the audio signalobtained as a result of the gain adjustment to the mixing unit 163. Inthis case, the time frame (n) in FIG. 19 corresponds to the present timeframe.

In addition, in a case where the priority information of the presenttime frame is equal to or higher than the threshold value P and thepriority information items of two time frames immediately before thepresent time frame are lower than the threshold value P, the SBRprocessing unit 273 adjusts the gain of the high frequency power valueat the fading SBR gain illustrated by the arrow GN22 in FIG. 20. Then,the SBR processing unit 273 performs the SBR using the high frequencypower value of which the gain is adjusted, and supplies the audio signalobtained as a result of the SBR to the mixing unit 163. In this case,the time frame (n+2) in FIG. 20 corresponds to the present time frame.

The SBR processing unit 273 performs the gain adjustment of the highfrequency power value only in the case of the two examples describedabove, and does not perform the gain adjustment in other cases andperforms the SBR using the acquired high frequency power value as it is,and then, supplies the audio signal obtained as a result of the SBR tothe mixing unit 163.

After the SBR being performed and the audio signal of the present timeframe being obtained, the processing tasks in STEP S191 to STEP S196 areperformed. However, those processing tasks are similar to those in STEPS87 to STEP S92 in FIG. 12, and the description thereof will not berepeated.

However, in STEP S195, in a case where the condition above-describedFormula (1) is satisfied, it is determined that the priority informationis equal to or higher than the threshold value Q. In addition, the IMDCTsignal (the audio signal) generated in the zero value output unit 200 orthe IMDCT unit 201 is output to the overlap adding unit 274.

In this way, when the IMDCT signal of the present time frame isobtained, the processing tasks in STEP S197 to STEP S199 are performedand the audio signal of the present time frame is generated. However,those processing tasks similar to those in STEP S188 to STEP S190 andthe description thereof will not be repeated.

In STEP S200, when the object audio signal acquisition unit 197 adds oneto the object number, the process returns to STEP S193. Then, when it isdetermined that the object number is not less than N in STEP S193, theselective decoding processing ends, and then, the process proceeds toSTEP S53 in FIG. 11.

As described above, the unpacking/decoding unit 161 selects the outputdestination of the MDCT coefficient according to the priorityinformation items of the present time frame and the time frames beforeand after the present time frame. In this way, the audio signal iscompletely reconfigured in the portion where the time frame in which thepriority information is equal to or higher than a threshold value andthe time frame in which the priority information is lower than thethreshold value are switched, and thus, it is possible to suppress thedeterioration of the sound quality when listening.

In addition, the unpacking/decoding unit 161 adjusts the gain of theoverlappingly added audio signal or the high frequency power value basedon the priority information items of three consecutive time frames. Thatis, the fade-in processing or the fade-out processing is appropriatelyperformed. In this way, the occurrence of the glitch noise issuppressed, and thus, it is possible to suppress the deterioration ofthe sound quality when listening.

Fifth Embodiment

<Fade-In Processing and Fade-Out Processing>

In the description in the fourth embodiment, the gain adjustment isperformed with respect to the overlappingly added audio signal, andfurther, the gain adjustment is performed with respect to the highfrequency power value at the time of SBR. In this case, the separategain adjustment in the low frequency component and the high frequencycomponent of the final audio signal, that is, the fade-in processing andthe fade-out processing are performed.

Here, the gain adjustment may not be performed immediately after theoverlapping addition and at the time of SBR, or the gain adjustment maybe performed with respect to the audio signal obtained by the SBR suchthat the fade-in processing and the fade-out processing can be realizedwith less processing.

In such a case, for example, the gain adjustment is performed asillustrated in FIG. 23 and FIG. 24. In FIG. 23 and FIG. 24, the partscorresponding to the case in FIG. 19 and FIG. 20 are written in the sameletters or the like, and the description thereof will not be repeated.

The changes of the priority information in an example illustrated inFIG. 23 are the same as that in the example of the case illustrated inFIG. 19. In this example, if the threshold value Q is 4, the priorityinformation of the time frame (n−1) is equal to or higher than thethreshold value Q, but the priority information items of the time frame(n) to the time frame (n+2) are less than the threshold value Q.

In this case, the gain adjustment is performed by the audio signalobtained by the SBR in the time frame (n) and the time frame (n+1) beingmultiplied by the fading signal gain illustrated on the polygonal lineGN31.

The fading signal gain illustrated in the polygonal line GN31 is thesame as the fading signal gain illustrated on the polygonal line GN11 inFIG. 19. However, in a case of an example in FIG. 23, since the audiosignal subject to the gain adjustment includes both of the low frequencycomponent and the high frequency component, the gain adjustment of thelow frequency component and the high frequency component can beperformed by one fading signal gain.

By the gain adjustment of the audio signal using the fading signal gain,the audio signals gradually change to the zero data at the portion wherethe IMDCT signal obtained by IMDCT and the IMDCT signal that is the zerodata are overlappingly added and the portion immediately therebefore. Inthis way, it is possible to suppress the deterioration of the soundquality when listening.

On the other hand, the change of the priority information in the exampleillustrated in FIG. 24 is the same as that in the case illustrated inFIG. 20. In this example, if the threshold value Q is 4, the priorityinformation items are lower than the threshold value Q at the time frame(n) and the time frame (n+1), but the priority information of the timeframe (n+2) is equal to or higher than the threshold value Q.

In such a case, the gain is adjusted by the audio signal obtained by theSBR at the time frame (n+1) and the time frame (n+2) being multiplied bythe fading signal gain illustrated on a polygonal line GN41.

The fading signal gain illustrated on the polygonal line GN41 is thesame as the fading signal gain illustrated on the polygonal line GN21 inFIG. 20. However, in the case of the example in FIG. 24, since the audiosignal subject to the gain adjustment includes both of the low frequencycomponent and the high frequency component, the gain adjustment of thelow frequency component and the high frequency component can beperformed by one fading signal gain.

By the gain adjustment of the audio signal using the fading signal gain,the audio signals gradually change from the zero data to the originalsignal at the portion where the IMDCT signal obtained by IMDCT and theIMDCT signal that is the zero data are overlappingly added and theportion immediately therebefore. In this way, it is possible to suppressthe deterioration of the sound quality when listening.

<Configuration Example of Unpacking/Decoding Unit>

In a case where the gain adjustment is performed by the fade-inprocessing or the fade-out processing described above referring to FIG.23 and FIG. 24, the unpacking/decoding unit 161, for example, isconfigured as illustrated in FIG. 25. In FIG. 25, the partscorresponding to the case in FIG. 21 are written in the same signs, andthe description thereof will not be repeated.

The unpacking/decoding unit 161 illustrated in FIG. 25 includes thepriority information acquisition unit 191, the audio signal acquisitionunit 192, the channel audio signal decoding unit 193, the outputselection unit 194, the zero value output unit 195, the IMDCT unit 196,the overlap adding unit 271, the SBR processing unit 273, the gainadjustment unit 272, the object audio signal acquisition unit 197, theobject audio signal decoding unit 198, the output selection unit 199,the zero value output unit 200, the IMDCT unit 201, the overlap addingunit 274, the SBR processing unit 276, and the gain adjustment unit 275.

The configuration of the unpacking/decoding unit 161 illustrated in FIG.25 is different from the configuration of the unpacking/decoding unit161 illustrated in FIG. 21 in the point that each of the gain adjustmentunit 272 and the gain adjustment unit 275 are disposed at the stageafter the SBR processing unit 273 and the SBR processing unit 276respectively.

In the unpacking/decoding unit 161 illustrated in FIG. 25, the SBRprocessing unit 273 performs the SBR with respect to the audio signalsupplied from the overlap adding unit 271 based on the high frequencypower value supplied from the priority information acquisition unit 191,and supplies the audio signal obtained from the result thereof to thegain adjustment unit 272. In this case, in the SBR processing unit 273,the gain adjustment of the high frequency power value is not performed.

The gain adjustment unit 272 adjusts the gain of the audio signalsupplied from the SBR processing unit 273 based on the priorityinformation supplied from the priority information acquisition unit 191,and supplies the audio signal to the mixing unit 163.

The SBR processing unit 276 performs the SBR with respect to the audiosignal supplied from the overlap adding unit 274 based on the highfrequency power value supplied from the priority information acquisitionunit 191, and supplies the audio signal obtained from the result thereofto the gain adjustment unit 275. In this case, in the SBR processingunit 276, the gain adjustment of the high frequency power value is notperformed.

The gain adjustment unit 275 adjusts the gain of the audio signalsupplied from the SBR processing unit 276 based on the priorityinformation supplied from the priority information acquisition unit 191,and supplies the audio signal to the rendering unit 162.

<Description of Selective Decoding Processing>

Subsequently, the operation of the decoding device 151 in a case wherethe unpacking/decoding unit 161 has a configuration illustrated in FIG.25 will be described. In this case, the decoding device 151 performs thedecoding processing described referring to FIG. 11. However, theprocessing illustrated in FIG. 26 is performed as the selective decodingprocessing in STEP S52.

Hereinafter, the selective decoding processing corresponding to theprocessing in STEP S52 in FIG. 11 will be described referring to a flowchart in FIG. 26. The processing tasks thereafter in STEP S231 to STEPS238 are the same as the processing tasks in STEP S181 to STEP S188 inFIG. 22, and the description thereof will not be repeated. However, inSTEP S232, the priority information is not supplied to the SBRprocessing unit 273 and the SBR processing unit 276.

In STEP S239, the SBR processing unit 273 performs the SBR with respectto the audio signal supplied from the overlap adding unit 271 based onthe high frequency power value supplied from the priority informationacquisition unit 191, and supplies the audio signal obtained from theresult thereof to the gain adjustment unit 272.

In STEP S240, the gain adjustment unit 272 adjusts the gain of the audiosignal supplied from the SBR processing unit 273 based on the priorityinformation of the channel subject to be processed supplied from thepriority information acquisition unit 191, and supplies the audio signalto the mixing unit 163.

Specifically, in a case where the priority information of the time frameimmediately before the present time frame is equal to or higher than thethreshold value P and the priority information of the present time frameand the priority information of the time frame immediately after thepresent time frame are lower than the threshold value P, the gainadjustment unit 272 adjusts the gain of the audio signal at the fadingsignal gain illustrated in the polygonal line GN31 in FIG. 23. In thiscase, time frame (n) in FIG. 23 corresponds to the present time frame,and in the time frame immediately after the present time frame, asillustrated in the polygonal line GN31, the gain adjustment at thefading signal gain of zero is performed.

In addition, in a case where the priority information of the presenttime frame is equal to or higher than the threshold value P and thepriority information items of two time frames immediately before thepresent time frame are lower than the threshold value P, the gainadjustment unit 272 adjusts the gain of the audio signal at the fadingsignal gain illustrated in the polygonal line GN41 in FIG. 24. In thiscase, time frame (n+2) in FIG. 24 corresponds to the present time frame,and in the time frame immediately before the present time frame, asillustrated in the polygonal line GN41, the gain adjustment at thefading signal gain of zero is performed.

The gain adjustment unit 272 performs the gain adjustment only in thecase of the two examples described above, and does not perform the gainadjustment in other cases, and supplies the audio signal to the mixingunit 163 as it is.

After the gain adjustment of the audio signal being performed, theprocessing tasks in STEP S241 to STEP S247 are performed. However, thoseprocessing tasks are similar to those in STEP S191 to STEP S197 in FIG.22, and the description thereof will not be repeated.

In this way, when the audio signal of the present time frame of theobject subject to be processed is obtained, the processing tasks in STEPS248 and in STEP S249 are performed and the final audio signal of thepresent time frame is obtained. However, those processing tasks aresimilar to those in STEP S239 and STEP S240, and the description thereofwill not be repeated.

In STEP S250, when the object audio signal acquisition unit 197 adds oneto the object number, the process returns to STEP S243. Then, when it isdetermined that the object number is not less than N in STEP S243, theselective decoding processing ends, and then, the process proceeds toSTEP S53 in FIG. 11.

As described above, the unpacking/decoding unit 161 adjusts the gain ofthe audio signal obtained by the SBR based on the priority informationitems of three consecutive time frames. In this way, the occurrence ofthe glitch noise is simply suppressed, and thus, it is possible tosuppress the deterioration of the sound quality when listening.

In the present embodiment, the example of selecting the outputdestination of the MDCT coefficient using the priority information itemsof three time frames and performing the gain adjustment by the fadingsignal gain is described. However, only the gain adjustment by thefading signal gain may be performed.

In such a case, in the output selection unit 194 and the outputselection unit 199, the output destination of the MDCT coefficient isselected by the processing similar to that of the case in the firstembodiment. Then, in the gain adjustment unit 272 and the gainadjustment unit 275, in a case where the priority information of thepresent time frame is lower than the threshold value, the fade-inprocessing or the fade-out processing is performed by linearlyincreasing or decreasing the fading signal gain of the present timeframe. Here, the determination of whether the fade-in processing beperformed or the fade-out processing be performed may be made by thepriority information of the present time frame and the priorityinformation items of the time frame immediately before and after thepresent time frame.

Sixth Embodiment

<Fade-In Processing and Fade-Out Processing>

Incidentally, in the rendering unit 162, for example, the VBAP isperformed and the audio signal of each channel for reproducing the soundof each object from the audio signal of each object is generated.

Specifically, in the VBAP, for each channel, that is, for each speakerthat reproduces the sound, with regard to each object, a gain value(hereafter, referred to as a VBAP gain) of the audio signal iscalculated for each time frame. Then, the sum of the audio signals ofeach channel multiplied by the VBAP gain of the same channel (speaker)is the audio signal of that channel. In other words, with regard to eachobject, the VBAP gain calculated for each channel is assigned to eachchannel.

Therefore, with regard to the audio signal of the object, the generationof the glitch noise may be suppressed and the deterioration of the soundquality when listening may be suppressed by appropriately adjusting theVBAP gain instead of adjusting the gain of the audio signal of theobject or the high frequency power value.

In such a case, for example, a linear interpolation or the like isperformed with respect to the VBAP gain of each time frame, and the VBAPgain of each sample of the audio signal in each time frame iscalculated, and then, the audio signal of each channel is generated bythe obtained VBAP gain.

For example, the VBAP gain value of a first sample in the time framesubject to be processed is the VBAP gain value of a last sample in thetime frame immediately before the time frame subject to be processed. Inaddition, the VBAP gain value of the last sample in the time framesubject to be processed is the VBAP gain value calculated by theordinary VBAP with respect to the time frame subject to be processed.

Then, in the time frame subject to be processed, the VBAP gain value ofeach sample between the first sample and the last sample is determinedsuch that the VBAP gain linearly changes from the first sample to thelast sample.

However, in a case where the priority information of the time framesubject to be processed is lower a than threshold value, the calculationof the VBAP is not performed, the VBAP gain value of each sample isdetermined such that and the VBAP gain value of the last sample of thetime frame subject to be processed becomes zero.

In this way, by performing the gain adjustment of the audio signal ofeach object through the VBAP gain, the gain adjustment of the lowfrequency component and the high frequency component can be performed inone time, and then, the occurrence of the glitch noise is suppressedwith a lesser amount of processing, and thus, it is possible to suppressthe deterioration of the sound quality when listening.

In a case where the VBAP gain is determined for each sample as describedabove, the VBAP gain for each sample of each time frame is, for example,as illustrated in FIG. 27 and FIG. 28.

In FIG. 27 and FIG. 28, the parts corresponding to the case in FIG. 19and FIG. 20 are written in the same letters or the like, and thedescription thereof will not be repeated. In addition, in FIG. 27 andFIG. 28, “VBAP_gain[q][s]” (where, q=n−1, n, n+1, n+2) indicates theVBAP gain of the time frame (q) of the object subject to be processed ofwhich a speaker index is s that specifies the speaker corresponding tothe predetermined channel.

The example illustrated in FIG. 27, is the example in which the changeof the priority information is the same as that in the case illustratedin FIG. 19. In this example, if the threshold value Q is 4, the priorityinformation of the time frame (n−1) is equal to or higher than thethreshold value Q. However, the priority information is lower than thethreshold value Q in the time frame (n) to the time frame (n+2).

In such a case, VBAP gains of the time frame (n−1) to the time frame(n+1) are, for example, the gain indicated on a polygonal line GN51.

In this example, since the priority information of the time frame (n−1)is equal to or higher than the threshold value Q, the VBAP gain of eachsample is determined based on the VBAP gain calculated by the ordinaryVBAP.

That is, the VBAP gain value of the first sample of the time frame (n−1)is the same as the VBAP gain value of the last sample of the time frame(n−2). In addition, with regard to the object subject to be processed,the VBAP gain value of the last sample of the time frame (n−1) is theVBAP gain value of the channel corresponding to the speaker s, which iscalculated by the ordinary VBAP with respect to the time frame (n−1).Then, the VBAP gain value of each sample of the time frame (n−1) isdetermined so as to linearly change from the first sample to the lastsample.

In addition, since the priority information of the time frame (n) islower than the threshold value Q, the VBAP gain value of the last sampleof the time frame (n) is zero.

That is, the VBAP gain value of the first sample of the time frame (n)is the same as the VBAP gain value of the last sample of the time frame(n−1), and the VBAP gain value of the last sample of the time frame (n)is zero. Then, the VBAP gain value of each sample of the time frame (n)is determined so as to linearly change from the first sample to the lastsample.

Furthermore, since the priority information of the time frame (n+1) islower than the threshold value Q, the VBAP gain value of the last sampleof the time frame (n+1) is zero, and as a result, the VBAP gain valuesof all the samples of the time frame (n+1) are zero.

In this way, by the VBAP gain value of the last sample of the time framein which the priority information is lower than the threshold value Qbeing zero, the fade-out processing equivalent to the example in FIG. 23can be performed.

On the other hand, the changes of the priority information in an exampleillustrated in FIG. 28 are the same as that in the example of the caseillustrated in FIG. 24. In this example, if the threshold value Q is 4,the priority information items in the time frame (n−1) to the time frame(n+1) are lower than the threshold value Q, but the priority informationitem of the time frame (n+2) is equal to higher than the threshold valueQ.

In this case, VBAP gains of the time frame (n−1) to the time frame (n+2)are, for example, the gain indicated on a polygonal line GN61.

In this example, since both of the priority information of the timeframe (n) and the priority information of the time frame (n+1) are lowerthan the threshold value Q, the VBAP gains of all the samples of thetime frame (n+1) are zero.

In addition, since the priority information of the time frame (n+2) isequal to or higher than the threshold value Q, with regard to the objectsubject to be processed, the VBAP gain value of each sample isdetermined based on the VBAP gain of the channel corresponding to thespeaker s, which is calculated by the ordinary VBAP.

That is, the VBAP gain value of the first sample of the time frame (n+2)is zero which is the VBAP gain value of the last sample of the timeframe (n+1), and the VBAP gain value of the last sample of the timeframe (n+2) is the VBAP gain value calculated by the ordinary VBAP withrespect to the time frame (n+2). Then, the VBAP gain value of eachsample of the time frame (n+2) is determined so as to linearly changefrom the first sample to the last sample.

In this way, by the VBAP gain value of the last sample of the time framein which the priority information is lower than the threshold value Qbeing zero, the fade-in processing equivalent to the example in FIG. 24can be performed.

<Configuration Example of Unpacking/Decoding Unit>

In a case where the gain adjustment is performed by the fade-inprocessing or the fade-out processing described above referring to FIG.27 and FIG. 28, the unpacking/decoding unit 161, for example, isconfigured as illustrated in FIG. 29. In FIG. 29, the partscorresponding to the case in FIG. 25 are written in the same signs, andthe description thereof will not be repeated.

The unpacking/decoding unit 161 illustrated in FIG. 29 includes thepriority information acquisition unit 191, the channel audio signalacquisition unit 192, the channel audio signal decoding unit 193, theoutput selection unit 194, the zero value output unit 195, the IMDCTunit 196, the overlap adding unit 271, the SBR processing unit 273, thegain adjustment unit 272, the object audio signal acquisition unit 197,the object audio signal decoding unit 198, the output selection unit199, the zero value output unit 200, the IMDCT unit 201, the overlapadding unit 274, and the SBR processing unit 276.

The configuration of the unpacking/decoding unit 161 illustrated in FIG.29 is different from the configuration of the unpacking/decoding unit161 illustrated in FIG. 25 in the point that the gain adjustment unit275 is not provided, and the points are the same as that illustrated inFIG. 25.

In the unpacking/decoding unit 161 illustrated in FIG. 29, the SBRprocessing unit 276 performs the SBR with respect to the audio signalsupplied from the overlap adding unit 274 based on the high frequencypower value supplied from the priority information acquisition unit 191,and supplies the audio signal obtained from the result thereof to therendering unit 162.

In addition, the priority information acquisition unit 191 acquires themeta-data and the priority information of each object from the suppliedbit stream and supplies the meta-data and the priority information tothe rendering unit 162. The priority information of each object is alsosupplied to the output selection unit 199.

<Description of Decoding Processing>

Subsequently, the operation of the decoding device 151 in a case wherethe unpacking/decoding unit 161 has a configuration illustrated in FIG.29 will be described.

The decoding device 151 performs the decoding processing describedreferring to FIG. 30. Hereinafter, the decoding processing performed bythe decoding device 151 will be described referring to a flow chart inFIG. 30. However, in STEP S281, the same processing as that in STEP S51in FIG. 11 is performed, and the description thereof will not berepeated.

In STEP S282, the unpacking/decoding unit 161 performs the selectivedecoding processing.

Here, the selective decoding processing corresponding to the processingin STEP S282 in FIG. 30 will be described referring to a flow chart inFIG. 31.

The processing tasks thereafter in STEP S311 to STEP S328 are the sameas the processing tasks in STEP S231 to STEP S248 in FIG. 26, and thedescription thereof will not be repeated. However, in STEP S312, thepriority information acquisition unit 191 supplies the priorityinformation acquired from the bit stream to the rendering unit 162 aswell.

In STEP S329, when the object audio signal acquisition unit 197 adds oneto the object number, the process returns to STEP S323. Then, when it isdetermined that the object number is not less than N in STEP S323, theselective decoding processing ends, and then, the process proceeds toSTEP S283 in FIG. 30.

Therefore, in the selective decoding processing illustrated in FIG. 31,with regard to the audio signal of each channel, the gain adjustment bythe fading signal gain is performed similarly to the case in the fifthembodiment, and with regard to each object, the gain adjustment is notperformed and the audio signal obtained by the SBR is output to therendering unit 162 as it is.

Returning to the description of the decoding processing in FIG. 30, inSTEP S283, the rendering unit 162 performs the rendering of the audiosignal of each object based on the audio signal of each object suppliedfrom the SBR processing unit 276, the position information as themeta-data of each object supplied from the priority informationacquisition unit 191, and the priority information of the present timeframe of each object.

For example, as described above referring to FIG. 27 and FIG. 28, withregard to each channel, the rendering unit 162 calculates the VBAP gainof each sample of the present time frame based on the priorityinformation of the present time frame of each channel and the VBAP gainof the last sample of the time frame immediately before the present timeframe. At this time, the rendering unit 162 appropriately calculates theVBAP gain by the VBAP based on the position information.

Then, the rendering unit 162 generates the audio signal of each channelbased on the VBAP gain of each channel for each sample calculated foreach object and the audio signal of each object, and supplies the audiosignal to the mixing unit 163.

Here, in the description, the VBAP gains of each sample are calculatedsuch that the VBAP gains of each sample in the time frame linearlychange. However, the VBAP gain may non-linearly change. In addition, inthe description, the audio signal of each channel is generated by theVBAP. However, even in a case where the audio signal of each channel isgenerated by other methods, it is possible to adjust the gain of theaudio signal of each object by the processing similar to the case ofVBAP.

After the audio signal of each channel being generated, the processingin STEP S284 is performed, and the decoding processing ends. However,since the processing in STEP S284 is the same as that in STEP S54 inFIG. 11, the description thereof will not be repeated.

In this way, the decoding device 151 calculates the VBAP gain for eachsample based on the priority information with regard to each object, andat the time of generating the audio signal of each channel, performs thegain adjustment of the audio signal of the object by the VBAP gain. Inthis way, the occurrence of the glitch noise is suppressed with a lesseramount of the processing, and thus, it is possible to suppress thedeterioration of the sound quality when listening.

In the descriptions in the fourth embodiment to the sixth embodiment,the output destination of the MDCT coefficient is selected using thepriority information of the time frames immediately before and after thepresent time frame, or the gain adjustment is performed by the fadingsignal gain or the like. However, not limited being thereto, thepriority information of the present time frame, and the priorityinformation of the time frames of a predetermined number of time framesbefore the present time frame or the priority information of the timeframes of a predetermined number of time frames after the present timeframe, may be used.

Incidentally, a series of processing tasks described above can beexecuted by hardware or can be executed by software. In a case where theseries of processing tasks are executed by software, a program thatconfigures the software is installed in a computer. Here, the computerincludes a computer that is built into dedicated hardware or ageneral-purpose computer, for example, which is capable of executingvarious functions by various programs being installed.

FIG. 32 is a block diagram illustrating a hardware configuration exampleof a computer that executes the series of processing tasks describedabove by a program.

In the computer, a central processing unit (CPU) 501, a read only memory(ROM) 502, and a random access memory (RAM) 503 are connected to eachother by a bus 504.

Furthermore, an input-output interface 505 is connected to the bus 504.An input unit 506, an output unit 507, a storage unit 508, acommunication unit 509 and a drive 510 are connected to the input-outputinterface 505.

The input unit 506 includes a keyboard, a mouse, a microphone, and animaging element. The output unit 507 includes a display and speakers.The storage unit 508 includes a hard disk or a non-volatile memory. Thecommunication unit 509 includes a network interface or the like. Thedrive 510 drives removable media 511 such as a magnetic disk, an opticaldisk, an optical magnetic disk or a semiconductor memory.

In the computer configured as described above, the CPU 501 loads theprogram stored in the storage unit 508 via the input-output interface505 and the bus 504 to the RAM 503 to execute the program, and then, theseries of processing tasks described above is performed.

The program executed by the computer (the CPU 501) can be provided bybeing recorded in a removable media 511 as a package media or the like.In addition, the program can be provided via a wired or a wirelesstransmission medium such as a local area network, the internet, ordigital satellite broadcasting.

In the computer, the program can be installed in the storage unit 508via the input-output interface 505 by mounting the removable media 511on the drive 510. In addition, the program can be received by thecommunication unit 509 via the wired or the wireless transmissionmedium, and can be installed in the storage unit 508. Furthermore, theprogram can be installed in the ROM 502 or the storage unit 508 inadvance.

The program executed by the computer may be a program in which theprocessing tasks are performed in a time series in the order describedherein, or may be a program in which the processing tasks are performedin parallel or in a necessary timing a call is made.

In addition, the embodiment of the present technology is not limited tothe embodiments described above, and various modifications can be madewithout departing the spirit of the present technology.

For example, the present technology can take a configuration of cloudcomputing in which one function is processed in sharing and cooperationwith a plurality of devices via a network.

In addition, each STEP described in the above flow charts can beexecuted by one device or can be executed in sharing by a plurality ofdevices.

Furthermore, in a case where a plurality of processing tasks areincluded in one STEP, the processing tasks included in that one STEP canbe executed by one device, or can be executed in sharing by a pluralityof devices.

In addition, the effects described herein are just examples and are notlimited thereto, and there may be other effects.

Furthermore, the present technology can have configurations as describedbelow.

(1) A decoding device comprising:

at least one circuit configured to:

acquire one or more encoded audio signals including a plurality ofchannels and/or a plurality of objects and priority information for eachof the plurality of channels and/or the plurality of objects; and

decode the one or more encoded audio signals according to the priorityinformation.

(2) The decoding device according to above (1), wherein the at least onecircuit is configured to decode according to the priority information atleast in part by decoding at least one of the one or more encoded audiosignals for which a priority degree indicated by the priorityinformation is equal to or higher than a degree, and refraining fromdecoding at least one other of the one or more encoded audio signals forwhich a priority degree indicated by the priority information is lessthan the degree.

(3) The decoding device according to above (2), wherein the at least onecircuit is configured to change the degree based at least in part on thepriority information for the plurality of channels and/or the pluralityof objects.

(4) The decoding device according to any one of above (1) to (3),wherein:

the at least one circuit is configured to acquire a plurality of sets ofpriority information for the one or more encoded audio signals, and

wherein the at least one circuit is configured to decode the one or moreencoded audio signals at least in part by selecting one of the sets ofpriority information and decoding based at least in part on the one setof priority information.

(5) The decoding device according to above (4), wherein the at least onecircuit is configured to select the one of the sets of priorityinformation according to a calculation capability of the decodingdevice.

(6) The decoding device according to any one of above (1) to (5),wherein the at least one circuit is further configured to generate thepriority information based at least in part on the encoded audio signal.

(7) The decoding device according to above (6), wherein the at least onecircuit is configured to generate the priority information based atleast in part on a sound pressure or a spectral shape of the audio ofthe one or more encoded audio signals.

(8) The decoding device according to any one of above (1) to (7),wherein:

the priority information for the plurality of channels and/or theplurality of objects comprises, for at least one first channel of theplurality of channels and/or at least one first object of the pluralityof objects, priority information indicating different priority degreesof the at least one first channel and/or at least one first object overa period of time; and

the at least one circuit is configured to decode based on the priorityinformation at least in part by determining, for the first channeland/or the first object and at a first time during the period of time,whether or not to decode the first channel and/or the first object atthe first time based at least in part on a priority degree for the firstchannel and/or the first object at the first time and a priority degreefor the first channel and/or the first object at another time before orafter the first time and during the period of time.

(9) The decoding device according to any one of above (1) to (8),wherein the at least one circuit is further configured to:

generate an audio signal for a first time at least in part by adding anoutput audio signal for a channel or object at the time and an outputaudio signal of the channel or object at a second time before or afterthe first time, wherein the output audio signal for the channel orobject for a time is a signal obtained by the at least one circuit as aresult of decoding in a case where decoding of the channel or object forthe time is performed and is zero data in a case where decoding of thechannel or object for the time is not performed; and

perform a gain adjustment of the output audio signal of the channel orobject at the time based on the priority information of the channel orobject at the time and the priority information of the channel or objectat the other time before or after the time.

(10) The decoding device according to above (9), wherein the at leastone circuit is further configured to:

adjust a gain of a high frequency power value for the channel or objectbased on the priority information of the channel or object at the firsttime and the priority information of the channel or object at the secondtime before or after the first time, and generate a high frequencycomponent of the audio signal for the first time based on the highfrequency power value of which the gain is adjusted and the audio signalof the time.

(11) The decoding device according to above (9) or (10), wherein the atleast one circuit is further configured to:

generate, for each channel or each object, an audio signal of the firsttime in which a high frequency component is included, based on a highfrequency power value and the audio signal of the time,

perform the gain adjustment of the audio signal of the first time inwhich the high frequency component is included.

(12) The decoding device according to any one of above (1) to (11),wherein the at least one circuit is further configured to assign anaudio signal of a first object, of the plurality of objects, to each ofat least some of the plurality of channels with a gain value based onthe priority information and to generate the audio of each of theplurality of channels.

(13) A decoding method comprising:

acquiring priority information for each of a plurality of channelsand/or a plurality of objects of one or more encoded audio signals; and

decoding the plurality of channels and/or the plurality of objectsaccording to the priority information.

(14) At least one non-transitory computer-readable storage medium havingencoded thereon executable instructions that, when executed by at leastone processor, cause the at least one processor to carry out a methodcomprising:

acquiring priority information for each of a plurality of channelsand/or a plurality of objects of one or more encoded audio signals; and

decoding the plurality of channels and/or the plurality of objectsaccording to the priority information.

(15) An encoding device comprising:

at least one circuit configured to:

generate priority information for each of a plurality of channels and/ora plurality of objects of an audio signal; and

store the priority information in a bit stream.

(16) The encoding device according to above (15), wherein the at leastone circuit is configured to generate the priority information at leastin part by generating a plurality of sets of priority information foreach of the plurality of channels and/or plurality of objects.

(17) The encoding device according to above (16), wherein the at leastone circuit is configured to generate the plurality of sets of priorityinformation for each of a plurality of calculation capabilities ofdecoding devices.

(18) The encoding device according to any one of above (15) to (17),wherein the at least one circuit is configured to generate the priorityinformation based at least in part on a sound pressure or a spectralshape of the audio signal.

(19) The encoding device according to any one of above (15) to (18),wherein: the at least one circuit is further configured to encode audiosignals of the plurality of channels and/or the plurality of objects ofthe audio signal to form an encoded audio signal, the at least onecircuit is further configured to store the priority information and theencoded audio signal in the bit stream.

(20) An encoding method comprising:

generating priority information for each of a plurality of channelsand/or a plurality of objects of an audio signal; and

storing the priority information in a bit stream.

(21) At least one non-transitory computer-readable storage medium havingencoded thereon executable instructions that, when executed by at leastone processor, cause the at least one processor to carry out a methodcomprising:

generating priority information for each of a plurality of channelsand/or a plurality of objects of an audio signal; and

storing the priority information in a bit stream.

It should be understood by those skilled in the art that variousmodifications, combinations, sub-combinations and alterations may occurdepending on design requirements and other factors insofar as they arewithin the scope of the appended claims or the equivalents thereof.

REFERENCE SIGNS LIST

11 encoding device

21 channel audio encoding unit

22 object audio encoding unit

23 meta-data input unit

24 packing unit

51 encoding unit

52 priority information generation unit

61 MDCT unit

91 encoding unit

92 priority information generation unit

101 MDCT unit

151 decoding device

161 unpacking/decoding unit

162 rendering unit

163 mixing unit

191 priority information acquisition unit

193 channel audio signal decoding unit

194 output selection unit

196 IMDCT unit

198 object audio signal decoding unit

199 output selection unit

201 IMDCT unit

231 priority information generation unit

232 priority information generation unit

271 overlap adding unit

272 gain adjustment unit

273 SBR processing unit

274 overlap adding unit

275 gain adjustment unit

276 SBR processing unit

1. A decoding device comprising: at least one circuit configured to:acquire one or more encoded audio signals including a plurality ofchannels and/or a plurality of objects and priority information for eachof the plurality of channels and/or the plurality of objects; and decodethe one or more encoded audio signals according to the priorityinformation.
 2. The decoding device according to claim 1, wherein the atleast one circuit is configured to decode according to the priorityinformation at least in part by decoding at least one of the one or moreencoded audio signals for which a priority degree indicated by thepriority information is equal to or higher than a degree, and refrainingfrom decoding at least one other of the one or more encoded audiosignals for which a priority degree indicated by the priorityinformation is less than the degree.
 3. The decoding device according toclaim 2, wherein the at least one circuit is configured to change thedegree based at least in part on the priority information for theplurality of channels and/or the plurality of objects.
 4. The decodingdevice according to claim 1, wherein: the at least one circuit isconfigured to acquire a plurality of sets of priority information forthe one or more encoded audio signals, and wherein the at least onecircuit is configured to decode the one or more encoded audio signals atleast in part by selecting one of the sets of priority information anddecoding based at least in part on the one set of priority information.5. The decoding device according to claim 4, wherein the at least onecircuit is configured to select the one of the sets of priorityinformation according to a calculation capability of the decodingdevice.
 6. The decoding device according to claim 1, wherein the atleast one circuit is further configured to generate the priorityinformation based at least in part on the encoded audio signal.
 7. Thedecoding device according to claim 6, wherein the at least one circuitis configured to generate the priority information based at least inpart on a sound pressure or a spectral shape of the audio of the one ormore encoded audio signals.
 8. The decoding device according to claim 1,wherein: the priority information for the plurality of channels and/orthe plurality of objects comprises, for at least one first channel ofthe plurality of channels and/or at least one first object of theplurality of objects, priority information indicating different prioritydegrees of the at least one first channel and/or at least one firstobject over a period of time; and the at least one circuit is configuredto decode based on the priority information at least in part bydetermining, for the first channel and/or the first object and at afirst time during the period of time, whether or not to decode the firstchannel and/or the first object at the first time based at least in parton a priority degree for the first channel and/or the first object atthe first time and a priority degree for the first channel and/or thefirst object at another time before or after the first time and duringthe period of time.
 9. The decoding device according to claim 1, whereinthe at least one circuit is further configured to: generate an audiosignal for a first time at least in part by adding an output audiosignal for a channel or object at the time and an output audio signal ofthe channel or object at a second time before or after the first time,wherein the output audio signal for the channel or object for a time isa signal obtained by the at least one circuit as a result of decoding ina case where decoding of the channel or object for the time is performedand is zero data in a case where decoding of the channel or object forthe time is not performed; and perform a gain adjustment of the outputaudio signal of the channel or object at the time based on the priorityinformation of the channel or object at the time and the priorityinformation of the channel or object at the other time before or afterthe time.
 10. The decoding device according to claim 9, wherein the atleast one circuit is further configured to: adjust a gain of a highfrequency power value for the channel or object based on the priorityinformation of the channel or object at the first time and the priorityinformation of the channel or object at the second time before or afterthe first time, and generate a high frequency component of the audiosignal for the first time based on the high frequency power value ofwhich the gain is adjusted and the audio signal of the time.
 11. Thedecoding device according to claim 9, wherein the at least one circuitis further configured to: generate, for each channel or each object, anaudio signal of the first time in which a high frequency component isincluded, based on a high frequency power value and the audio signal ofthe time, perform the gain adjustment of the audio signal of the firsttime in which the high frequency component is included.
 12. The decodingdevice according to claim 1, wherein the at least one circuit is furtherconfigured to assign an audio signal of a first object, of the pluralityof objects, to each of at least some of the plurality of channels with again value based on the priority information and to generate the audioof each of the plurality of channels.
 13. A decoding method comprising:acquiring priority information for each of a plurality of channelsand/or a plurality of objects of one or more encoded audio signals; anddecoding the plurality of channels and/or the plurality of objectsaccording to the priority information.
 14. At least one non-transitorycomputer-readable storage medium having encoded thereon executableinstructions that, when executed by at least one processor, cause the atleast one processor to carry out a method comprising: acquiring priorityinformation for each of a plurality of channels and/or a plurality ofobjects of one or more encoded audio signals; and decoding the pluralityof channels and/or the plurality of objects according to the priorityinformation.
 15. An encoding device comprising: at least one circuitconfigured to: generate priority information for each of a plurality ofchannels and/or a plurality of objects of an audio signal; and store thepriority information in a bit stream.
 16. The encoding device accordingto claim 15, wherein the at least one circuit is configured to generatethe priority information at least in part by generating a plurality ofsets of priority information for each of the plurality of channelsand/or plurality of objects.
 17. The encoding device according to claim16, wherein the at least one circuit is configured to generate theplurality of sets of priority information for each of a plurality ofcalculation capabilities of decoding devices.
 18. The encoding deviceaccording to claim 15, wherein the at least one circuit is configured togenerate the priority information based at least in part on a soundpressure or a spectral shape of the audio signal.
 19. The encodingdevice according to claim 15, wherein: the at least one circuit isfurther configured to encode audio signals of the plurality of channelsand/or the plurality of objects of the audio signal to form an encodedaudio signal; and the at least one circuit is further configured tostore the priority information and the encoded audio signal in the bitstream.
 20. An encoding method comprising: generating priorityinformation for each of a plurality of channels and/or a plurality ofobjects of an audio signal; and storing the priority information in abit stream.
 21. At least one non-transitory computer-readable storagemedium having encoded thereon executable instructions that, whenexecuted by at least one processor, cause the at least one processor tocarry out a method comprising: generating priority information for eachof a plurality of channels and/or a plurality of objects of an audiosignal; and storing the priority information in a bit stream.